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Abstract 


This report explores the use of grouping in object recognition by compu¬ 
tational systems. Many recognition systems in the past have performed an 
undirected search through the space of different segmentations of an image 
in order to recognize objects. This approach leads to significant problems 
of computational complexity and accuracy. The process of grouping deter¬ 
mines the sections of an image most likely to come from a single object. This 
can tell a recognition system which segmentations of the image to consider 
first, improving both its speed and accuracy. 

The report describes a particular recognition system called GROPER. GROP¬ 
ER performs grouping by using distance and relative orientation constraints 
that estimate the likelihood of different edges in an image coming from the 
same object. The thesis presents both a theoretical analysis of the group¬ 
ing problem and a practical implementation of a grouping system. And 
it discusses the relevance of this theory of grouping to human psychology. 
GROPER also uses an indexing module to allow it to make use of knowl¬ 
edge of different objects, any of which might appear in an image. We test 
GROPER by comparing it to a similar recognition system that does not use 
grouping. 


Thesis Supervisor: Prof. W. Eric L. Grimson 
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Chapter 1 


Introduction 


1.1 Why Grouping? 

This thesis argues for the use of grouping in model-based, visual object recognition. Ac¬ 
cording to this thesis, a computer attempting to recognize objects in an image should begin 
by determining what sections of the image seem likely to have come from a single object. 
We will call this process grouping. Only after performing grouping should the computer 
system try to determine which objects might have produced these sections of the image. 
This thesis follows the work of Lowe[l8], who built the first computational recognition 
system to explicitly make use of grouping. 

By model-based, visual object recognition I mean the process we employ in day to day 
life, such as when I look at my desk and recognize the stapler, the text book, and the piles 
of papers, failing to locate my pen. Grouping means deciding that some segment of this 
cluttered scene all comes from a single object. For example, suppose I notice that something 
silver, with a red blob at the end, is sticking out from under my notebook. Perhaps, before 
I recognize what that thing is, I decide that it is probably all one thing; that the silver 
part of the image and the adjacent red part of the image probably both came from just 
one object. That decision would belong to the grouping process, as I will use the term 
“grouping”. I might then use the shape of the silver and red blobs, and their relationship, 
to deduce that the thing is a pencil. If so, I have recognized the pencil with the assistance 
of grouping. The reader will not necessarily find it obvious that we actually use grouping 
when we recognize objects; this example only illustrates how one might use grouping as 
part of the recognition process. 

A more detailed example will help explain this process better. It will show how this 
thesis formulates the object recognition problem, and how grouping can play a part in a 
model-based recognition system. It will also demonstrate three advantages of using grouping 
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Figure 1.1: Models of Geometric Shapes 

to recognize objects. First of all, grouping can help us to avoid mistakes we might otherwise 
make. Secondly, it can speed up recognition by reducing the number of possibilities we must 
consider. And thirdly, it can provide more information to use in determining what objects 
we must consider. A comparison between recognition that uses grouping and recognition 
that does not make use of grouping will demonstrate these advantages. 

Suppose we wish to use a computer to recognize geometric shapes. The computer will 
begin with two kinds of information. First of all, it will begin by knowing about some 
geometric shapes, which it will try to recognize. It might, for example, start off knowing 
about the shapes in figure 1.1. Secondly, the computer will receive an image that contains 
some of these shapes, such as the image shown in figure 1.2. This example, though simple, 
will demonstrate some characteristics of more complicated object recognition problems. 
So in this image, some shapes overlap others, covering up part of their perimeter. And 
sections of the perimeter of some shapes do not appear, just as parts of an object may 
blend into the background, producing an indistinct outline. The problem then has these 
two inputs: models of objects to recognize, and an image that may contain some of these 
objects, partially obscured, in unknown positions. 

For its output, we want the computer to locate the shapes that it knows about as quickly 
and accurately as possible. 

We can do this by interleaving a grouping phase, and a recognition phase. We first use 
grouping to determine what edges in the image seem to have come from a single geometric 
shape. Then in the recognition phase, we determine which, if any, of the known shapes 
could have produced the edges that the grouping phase has lumped together. Based on the 
results, we will decide how to proceed. 
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Figure 1.2: A Sample Scene 



Figure 1.3: The highlighted edges look like they all came from the same object. 



Figure 1.4: These edges also appear to have come from a single object. 
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So we might begin by deciding that the edges highlighted in figure 1.3 seem to have all 
come from a single object. (How we make decisions like that will take up a large portion 
of this thesis). We then compare these edges to the shapes in figure 1.1, and successfully 
identify shape number 4. We might then decide that, of the remaining edges in the image, 
the ones highlighted in figure 1.4 appear to have all come from a single object. We compare 
these edges to the shapes we know about, and conclude that they must have come from 
shape number 1. This vague description of an approach to model-based object recognition 
highlights two main problems on which this thesis will concentrate: how we decide which 
groups of edges all came from a single object, and how we decide from which object a group 
of edges came. 

If we can solve these two problems, we will have a system with some considerable advan¬ 
tages over other approaches to model-based object recognition. To show this, I will propose 
a straw man recognition system that makes no use of grouping at all. Most computational, 
model-based object recognition systems make use of some type of grouping, at least im¬ 
plicitly. But without any grouping system, we would have to search randomly through the 
space of all possible collections of edges, until we found one in which all the edges really 
did come from a single shape. We might start out selecting two edges at random from the 
image. We would then compare these edges to our shape models. If we could not find any 
known shape that could have produced these two edges, we would give up on them and try 
another pair of edges. On the other hand, suppose we knew of three different shapes that 
could have produced that randomly chosen pair of edges. Then we would randomly select 
a third edge, and see if any of the three objects might have produced that edge as well. We 
would continue in that way, until we came up with some collection of edges that only one 
shape might have produced, and in that way succeed in recognizing a shape in the image. 

The next chapter will discuss these two approaches to model-based recognition in more 
detail. But the brief example given allows us to discuss three significant advantages of 
having a grouping system. 

First of all, a good grouping system can quickly lead us to the right answer. Instead 
of searching at random for good combinations of edges, a grouping system is designed to 
tell us which combinations are most likely all to have come from a single object. We can 
perform grouping in less time than it would take to compare all different combinations of 
edges to our models, because grouping does not depend on the objects that we are looking 
for, and because it is a local visual process. 

Secondly, grouping can save us from making certain kinds of mistakes. Suppose, without 
the assistance of grouping, we ended up wondering which object might have produced the 
edges highlighted in figure 1.5. When we compared these edges to our models of shapes, 
we would find that only shape 1 might have produced them all. This would provide us 
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Figure 1.5: If we try collections of edges at random in our search for objects we might 
consider the edges highlighted. Of the four objects we know about, only object one could 
have produced all these edges. 

with a strong reason to locate shape 1 at that position in the image. We would be wrong. 
Intuitively, people do not make that mistake because those particular edges do not look as 
if they all came from the same object. And there are additional edges in the image that 
seem to go with each of the three edges in this combination. 

This example may seem a bit contrived. After all, what are the chances that some 
randomly chosen edges in an image, which do not really all come from the same object, will 
nonetheless look just like some object we know about? In fact, if we know about thousands 
of different objects, all of which have flexible parts, there is a good chance that for any small 
random collection of edges we can find some object we know about, in some position that 
might have produced those edges. Without grouping, this will complicate the recognition 
process and increase the chances of errors occurring. 

Thirdly, grouping provides a more advanced starting point for the recognition process. 
When we recognized shape 1 from the edges highlighted in figure 1.4, we began with four 
edges provided for us by the grouping process, and only then looked at our models to decide 
which shape might have produced those edges. Because we already had a collection of four 
edges, we could use the relationship between these edges to index into our fist of known 
shapes and weed out the ones that could not possibly match. For example, we might notice 
that those four edges produce two ninety degree corners. Of the four shapes we know of, 
only two contain two ninety degree corners. In this way we can quickly find that we need 
to consider only half of the shapes. 

Grouping has the potential to improve the speed and accuracy of a model-based object 
recognition system. It can speed things up by leading us quickly to large collections of edges 
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Figure 1.6: A picture of six objects. 

that all come from a single object. This will cut down on the number of false possibilities we 
must try out, and at the same time allow us to index into a library of different objects that 
might have produced the edges, cutting down on the number of objects we must consider. 
Grouping can improve the accuracy of a recognition system by providing additional evidence 
for the existence of a suspected object in the image. Instead of only considering whether an 
object could have produced some collection of edges, we may also consider whether in fact 
those edges really seem to have all come from the same object. As we address complicated 
recognition tasks, we will need both of these benefits more and more. 

Figures 1.6 and 1.7 show the results of a grouping-based recognition system. It success¬ 
fully finds all the objects in the image, considering a total of only 8 different collections of 
edges in the process. 


1.2 The Structure of the Thesis 

The rest of this thesis will elaborate the points hinted at in this introduction. Chapter 2 
will provide a more complete overview of this work. Chapter 3 will present a theory of 
computation for grouping. To perform grouping, we would like to know the probability 
that some edges all come from a single object, given their relative positions. Due to the 
difficulty of accurately modeling the real world, this chapter presents a model of a simplified 
world. It then draws some conclusions about that simplified world which suggest ways of 
approximating the real world probabilities about which we would like to know. 

Using this theory of grouping, chapter 4 will investigate the relationship between the 
suggestions for computational grouping presented in this thesis and human grouping per- 


12 




Figure 1.7: The objects a grouping based recognition system finds in the previous image. 
To find these six objects, it had to consider only eight different collections of edges. 

formance. When presented with images consisting of edges, people tend to see some of 
these edges grouped together. Chapter 4 will show that the theory of grouping correctly 
predicts some human grouping experiences. 

Chapter 5 will describe a grouping algorithm based on this theory. This algorithm 
largely follows the lines suggested by the theory of grouping, but it does introduce some 
simplifications and fills in some of the holes left open by the theory. 

Chapter 6 will discuss a model-based recognition system designed to work with grouping. 
It will describe a recognition module that indexes into a data base of different object models. 
That is, given a group of edges, this module looks in a table to see which objects might 
have produced those edges, instead of searching through the set of all possible matches of 
image edges to model edges. 

Chapter 7 will briefly describe a verification step that can provide additional evidence 
for a match between some image edges and model edges. This verification step is simple, 
and based on similar work done by other researchers. 

Chapter 8 will describe some work done by other researchers in model-based object 
recognition. In particular, it will point out the debt this work owes to that of David Lowe. 
In some respects, this work is an extension to Lowe’s work on grouping. It will also point 
out that many other previous recognition systems work well because they implicitly use a 
certain amount of grouping. And it will discuss some other approaches to indexing. 

Finally, chapter 9 will evaluate the results of the entire recognition system. It will show 
the performance of this system on real images. It will discuss how successfully the grouping 
system judges which edges should go together, and to what extent this grouping reduces 
the amount of computation needed to recognize objects. 
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Chapter 2 


Overview 


This chapter will discuss more precisely some of the notions mentioned in the introduction. 
It will describe a limited subset of the object recognition problem within which this thesis 
will operate. It will also more precisely define grouping and indexing, concepts that the 
previous chapter introduced at only an intuitive level. These definitions will allow a more 
careful argument in favor of grouping to develop, an argument that will contrast this ap¬ 
proach with the approach taken in other object recognition systems. Finally, the end of the 
chapter will discuss what we may reasonably expect from a grouping system, explaining 
why we might hope to be able to build a recognition system using grouping. 

2.1 What is Object Recognition? 

Object recognition makes use of images of the world and knowledge about objects in order 
to locate objects. The knowledge we use in recognizing objects may range widely, from the 
specific shape of Jim Rice’s face, to the fact that he often appears in the outfield at Fenway 
Park, but rarely at Shea Stadium. Images also contain a wide range of effects. Intrinsic 
qualities of objects produce differing light intensities throughout an image, but changes 
in illumination also effect these intensities. Often the perimeters of objects create sharp 
changes in intensity in an image. But such changes also occur due to shadows, specularities, 
and the texture or surface markings of objects. And sometimes objects’ perimeters do not 
create sharp changes in intensity, because an object’s surface material appears to blend into 
the background, or because another object stands between the first object and the viewer. 
We can characterize visual object recognition as the process that combines any knowledge 
of the world that the viewer may have with a continuous sequence of images of overlapping 
three dimensional objects subject to varying lighting conditions, in order to locate known 
objects in the world. 
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The wide variety of possible inputs to the object recognition problem make it difficult 
to solve. For that reason, this thesis will discuss an object recognition system, GROPER 
(GRouping-based Object PERception), that focuses on a more limited version of the recog¬ 
nition problem. Making progress on even a subset of the recognition problem may prove 
valuable because practical applications exist for solutions to limited recognition problems, 
and because progress on these problems may provide insight into how to address the more 
complete recognition problem. This thesis addresses recognition making use of only simple, 
well defined knowledge of the world, and handling images with only some of the complexity 
of the images that humans routinely handle. 

Instead of making use of many kinds of knowledge of the world, GROPER will use only 
precise geometric knowledge about objects. It will know the exact size and shape of the 
objects it seeks, but it will not use any other kind of knowledge directly, including even the 
color or texture of the material that composes these objects. GROPER might, for example, 
know exactly what a specific stapler looks like. But it would not make use of some more 
general model to recognize a slightly different stapler. Moreover, one of GROPER’s models 
of an object will only depict that object in a specific pose. So GROPER’s model of a stapler 
might allow it to recognize the stapler in an ordinary position, but not when the top of the 
stapler is open so that it may be loaded with new staples. And GROPER would know how 
the stapler looked from a certain distance away, but would not recognize the stapler from 
a significantly greater or lesser distance. 

This simplification of the problem will not effect the grouping portion of GROPER 
because grouping will not make any use of knowledge about particular objects. If an 
unknown object with flexible parts appears in the image, the grouping module will be just 
as likely to find edges that all come from that object as it will for a known object. But this 
limitation on knowledge of objects will greatly reduce the problems involved in building the 
recognition component of GROPER. 

GROPER will also make use of two important simplifications in the types of images on 
which it will work. The research described in this thesis assumes that only two dimensional 
objects will appear in the image, such as flat machine parts. And the theory of grouping 
developed only addresses the problem of how to group together the occluding edges of 
objects that appear in the image, although GROPER does not assume that it knows on 
which side of an occluding edge the object lies, and which side is the background. Limiting 
the objects expected to two dimensions simplifies both the recognition and the grouping 
portion of GROPER. And the use of only occluding edges makes it much easier to decide 
what edges come from a single object. 

Finally, GROPER will use only a single image of a scene when it looks for objects. We 
could gain much additional information about a scene from a series of images taken over a 
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period of time during which either the objects or the viewer moves. 

GROPER does address some aspects of the recognition problem that many compu¬ 
tational object recognition systems do not. GROPER will make use of knowledge of a 
library of different objects, many of which look quite similar, or have common subparts. 
And GROPER will recognize objects in scenes that contain about a dozen similar objects. 
These types of situations cause difficulties of accuracy and computational complexity. These 
difficulties will only grow more acute as researchers deal with recognition problems closer 
to the task that humans perform. They provide the motivation for GROPER’s approach 
to recognition. 


2.2 A Recognition Algorithm 

GROPER’s recognition algorithm makes use of three main components: grouping, indexing, 
and verification. Subsequent sections will describe these components in more detail, but 
first this section will explain how they fit together. In brief, grouping provides GROPER 
with groups of intensity edges that seem likely all to come from a single object. Indexing 
takes a group of intensity edges, and decides which edges from known objects might have 
produced these image edges. Verification then takes one of these matches between an object 
model’s edges and image edges, and makes a final decision about whether that object might 
have produced those image edges. 

GROPER uses these components in the following algorithm: 

1. Extract straight line approximations to all the intensity edges in an image. 

2. Using grouping, choose the group of edges thought most likely to have come from a 

single object. 

3. Using indexing, determine which collections of model edges might match this group of 

image edges. 

4a. If no modeled edges could match the image edges chosen, return to step two, using 
grouping to choose the next most promising group of edges. 

4b. If only a few sets of model edges match these image edges, perform a verification 
step on these hypothesized matches. Extract the edges produced by any successfully 
recognized objects from the image, and return to step two. 

4c. If a group could match many different sets of object edges, try to extend the group 
and narrow down the possibilities by using grouping to add more edges that seem to 
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come from the same object as the originally chosen edges. This will lead either to the 
situation in step four a or step four b. 

This algorithm will help to explain why GROPER requires certain kinds of results from 
grouping, indexing and verification. 


2.3 What is Grouping? 

The introduction argued that grouping can help overcome problems of inaccuracy and 
complexity. This section will explain more clearly what grouping means. It will make three 
main points. First of all, this section will explain operationally what grouping does: what 
it begins with, and what it produces. Next, this section will explain how GROPER can 
perform grouping in a way that is independent of any particular library of objects for which 
it looks. Finally, this section will provide an overview of how GROPER does grouping. 

First of all, we must distinguish the type of grouping that GROPER performs from a 
kind of feature detection that other authors sometimes refer to as grouping. Some authors 
use the word grouping to describe the process of making straight line approximations to 
the intensity edges in an image, or finding corners or other types of simple features in the 
intensity edges an image produces. This thesis does not mean that. In fact, GROPER uses 
straight line approximations to image edges as the input to its grouping component. By 
“grouping”, we mean a process that attempts to locate collections of these straight lines 
that all seem to come from a single object. 

The research described in this thesis only addresses the problem of grouping together the 
intensity edges produced by the perimeter of an object. We will not discuss, for example, 
how one might decide whether two different homogeneous chunks of an image probably 
come from the same object. For this reason, GROPER uses as input the results of an 
intensity edge detector, such as the Marr-Hildreth edge detector (Marr[20]). These edges 
make good candidates for the location of the perimeter of objects in an image. 

Because this thesis only explores a serial algorithm for recognition, GROPER tries the 
best available groups of edges, one at a time, to see if each will help it to recognize an 
object. Several different factors make a group of edges good for a recognition system, but 
we will focus on only one of these, the fact that the greater the chances are that a group 
of edges comes from a single object, the more likely it is to lead GROPER to correctly 
identify an object. We might also select a group of edges because if the edges do, in fact, all 
come from a single object, then that group is also likely to lead quickly to the identification 
of that object. For example, we might feel that a particular group of two edges has a 
slightly greater chance of coming from a single object than a different group of five edges, 
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but that the group of five edges will provide us with more information which we can use 
to decide what object produced those edges. Or, we might want a grouping system to 
favor groups of edges especially suitable for our indexing scheme. Lowe[18] makes this a 
primary consideration in his grouping system. So three possible factors could influence 
one’s criteria for grouping together edges: the edges’ likelihood of coming from the same 
object, the extent to which they will narrow down our search if correct, and their usefulness 
in indexing. Of these three, GROPER emphasizes the first criteria; its output consists of 
the collection of edges deemed most likely to come from a single object. 

Also, grouping must be relatively fast. Obviously, one way to perform grouping would 
be by looking for objects in the image, and grouping together all the edges that appear to 
come from the same object. To be a useful first step in object recognition, it must take 
much less time to group together edges than it would take to find objects without grouping. 

Notice that to perform this function, grouping does not need to partition the edges in an 
image. The best collection of edges to try when looking for objects might combine an edge 
with some others, while another good possibility combines that edge with a different set of 
edges. So grouping does not need to divide image edges into non-intersecting sub-groups, 
but rather to produce an ordered collection of groups, which may partially overlap. 

To determine the likelihood of a group of edges in an image coming from the same 
object, we must have a model of the world which produced those edges. In different kinds 
of worlds, we might approach grouping in different ways. In a world containing randomly 
oriented, flat squares, for example, we would only group together parallel or perpendicular 
edges. In a world with randomly shaped objects, on the other hand, we might not use the 
angle between edges at all in deciding their likelihood of coming from the same object. So 
the model we have of the world will have an important influence on the strategy we adopt 
for grouping. 

The real world does not have a simple description. It has a huge variety of objects which 
appear, not according to some simple random distribution, but in locations influenced by 
a variety of factors. For example, objects need support or they fall down, and screwdrivers 
appear near hammers more often then they appear near elephants. We could not possibly 
come up with a complete model of the real world which accurately reflected all such facts. 

So instead, GROPER makes use of a model of a simplified world of two dimensional, 
polygonal objects with convex subparts. The conclusions made about this world, however, 
seem to still apply well to the real world, as we will see later. The point we make here 
is that GROPER’s model of the world does not involve specific objects. Within certain 
general constraints, GROPER’s world model could include any set of objects. 

We might instead have used the library of objects for which we are looking to decide 
how to perform grouping. With this approach, we might look for general characteristics 
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Figure 2.1: GROPER groups together the closest two collections of edges. 


of our object library, such as particular angles or other relationships between edges which 
frequently occur. Turney et. al.[24] and Knoll and Jain[l5] have taken this approach to 
some extent. GROPER does not, for several reasons. First of all, in order to use different 
libraries, this approach would require us to automatically analyze each particular collection 
of objects. Some libraries might provide particularly useful characteristics on which to base 
grouping, while others might not. Secondly, large libraries of objects with flexible parts 
will require quite general models of the world. With such libraries we can depend less on 
finding a fortuitous set of characteristics which their objects have in common. We may, 
nonetheless, still expect to find some generalizations which will apply to almost any library 
of real objects. For example, we may find that almost any such library will tend to have 
objects with a large number of parallel edges. However, a third reason for first analyzing 
a very general model of objects is that the constraints that prove useful for grouping with 
such a model will probably also apply to more specific sets of objects. We can then add 
extra constraints suited to a particular domain, such as the tendency of objects to produce 
many parallel edges. 

The general constraints which GROPER uses involve the distance between groups of 
edges and their relative orientation. For example, GROPER will decide that two groups 
of edges that are close together are more likely to come from a single object than two 
groups that are far apart, all other factors being equal. In figure 2.1 GROPER will find 
the closest pair of groups of edges most likely to come from one object. This principle may 
seem intuitively obvious, and in fact Gestalt psychologists have noticed that people group 
together edges on this basis. But this intuition leaves many questions unsettled, such as 
exactly how the likelihood of edges coming from the same object changes, as the distance 
between them changes. We can get a sense of these details by analyzing a general model 
of the world. 

Similarly, we can draw some conclusions about the effect of the orientation of two groups 
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Figure 2.2: In each image, GROPER prefers the combination of groups A and B over the 
combination of groups A and C, due to the groups’ relative orientation. 

of edges on the likelihood that they come from a single object. In brief, we find that this 
likelihood depends on the degree to which two groups of edges point at each other. Figure 
2.2 depicts two examples in which GROPER will choose one pair of groups of edges over 
another because of their orientation. Together, these distance and orientation constraints 
form the basis for GROPER’s grouping system. 


2.4 What is Indexing? 

In GROPER, indexing takes a group of edges produced by the grouping component and 
decides which edges from what objects might have produced them. An object might produce 
a group of image edges if, for some possible position and orientation of the object in the 
scene, its perimeter aligns with the edges in the image, within some error bounds that reflect 
the imperfection of the sensing and edge detection. Two further requirements emerge from 
the way GROPER uses indexing: GROPER can tolerate certain kinds of mistakes more 
readily than others, and GROPER requires a flexible indexing scheme that will work on 
any set of input edges. 

Verification will correct some mistakes that indexing might make, while it can not 
correct other types of mistakes. In particular, if indexing produces some matches that 
really will not work out, verification will discover that, preventing GROPER from making 
a false positive identification of an object. However, if GROPER’s indexing fails to produce 
a correct match between a group of image edges and an object, then GROPER may lose 
forever the chance to identify that object. As a consequence of this, indexing must produce 
all feasible matches between the group of edges used as input and the object models in its 
library, but it may also produce some matches that are not really feasible. 
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The edges which GROPER groups together have no definite characteristics. In contrast, 
Schwartz[22] performs indexing using connected sections of edge, while Lowe[18] groups 
together edges that have a special relationship, such as parallelism or cotermination. So 
GROPER needs an approach to indexing that can work for any arbitrary collection of edges. 

We use the term indexing to describe this whole process because we use the relationships 
between the edges in a group to compute an index of a table. Using this index, we look in 
the table to find the edges of object models that have the same index, and thus the same 
interrelationship. For example, we might compute the angle between two image edges. 
Using this angle we can then look in a table to find all pairs of modeled object edges 
which form that angle. This narrows down the possible pairs of object edges we must 
consider as potential matches. GROPER computes five different parameters that describe 
the relationship between two image edges, and uses all of them for indexing. By using 
indexing to narrow down the number of possible matches, GROPER greatly reduces the 
amount of computation needed to recognize objects. 


2.5 What is Verification? 

GROPER uses verification when it already has a good idea of the location of one of the 
objects in its library, and it wants to make a final decision about whether to accept this 
hypothesis. Verification makes use of all available evidence. Given a match between image 
edges and object edges, it estimates the position of the object in the scene. It then looks 
through all the image edges, not just the ones in the match, to see which ones come close 
enough to edges of the object model in its suggested position. GROPER then makes a final 
decision as to whether it has found enough evidence to support the hypothesis in question. 
It does this, rather arbitrarily, based on whether the edges found account for twenty-five 
percent of the object’s perimeter. 

Many computer object recognition systems have made use of similar types of verifi¬ 
cation, and GROPER does not contribute much that is new in this area. Experiments 
with GROPER, however, have suggested some shortcomings in this type of verification. In 
particular, the grouping step was designed so that GROPER would first consider only the 
combinations of edges most likely to have come from a single object. Verification has no 
such provision. As a result, verification may result in finding a number of edges that align 
with a hypothesized object, but that do not all come from a single object. This may re¬ 
sult in the incorrect identification of objects. Even when GROPER does correctly identify 
an object, it frequently decides incorrectly that extra image edges came from that object, 
and removes them from consideration when it looks for additional objects. This can cause 
GROPER to miss some objects later. 
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2.6 Why Grouping? 

We can now present a more formal argument for the importance of using grouping in object 
recognition. This argument again relies on the problems of inaccuracy and complexity that 
can arise as we deal with more and more complex object recognition tasks. We will argue 
that grouping overcomes serious problems in the domain in which GROPER operates by 
contrasting GROPER with an object recognition system that does not use grouping. And 
we will point out that these problems will become much worse in more complex domains. 

If an object recognition system makes use of no principles that tell it which groups of 
edges to consider first, then it must search through randomly chosen collections of edges 
in order to find objects. We have implemented a recognition system called SEARCHER 
which does just that. SEARCHER operates identically to GROPER, only without grouping. 
Instead, SEARCHER uses a backtracking search. 

Although we can take SEARCHER as an indication of how a recognition system will 
do without any grouping, most recognition systems do not work that way. Instead, they 
implicitly make use of grouping, or take a quite different approach, as chapter 8 will dis¬ 
cuss. But, while SEARCHER is something of a straw man compared to existing recognition 
systems, the problems it illustrates have mainly been attacked with the use of grouping. 
GROPER differs from most other recognition systems not in using grouping, but in explic¬ 
itly analyzing the grouping problem in order to do as much and as effective grouping as 
possible. 

When SEARCHER randomly chooses groups, it usually picks groups of edges that do 
not all come from the same object. But often these edges still line up with some known 
object, at some orientation, causing SEARCHER to pursue false leads, and frequently to 
incorrectly recognize objects not in the scene. Typically the mistakes SEARCHER makes 
seem ludicrous to a human for one of three reasons. One is that SEARCHER decides an 
object is on one side of an edge, while a human observer can clearly see that that side 
is background. SEARCHER, like GROPER, does not start off knowing which side of an 
edge is figure and which is background. A second is that sometimes in its interpretation 
SEARCHER uses two different edges that to a person seem to clearly come from two 
different objects. A third reason for some of SEARCHER’S failures is that it may make use 
of some edges that do seem to come from a single object, without also using some other 
edges that clearly come from the same object. These difficulties result in SEARCHER 
making a large number of false identifications when it operates in GROPER’s domain. 
Figure 2.3 gives an example of the objects that SEARCHER finds in an image. Figure 2.4 
highlights the three types of mistakes SEARCHER typically makes. 

A variety of reasons might explain why other recognition systems do not produce the 
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Figure 2.3: The left-most image shows the outlines of sixteen different objects about which 
SEARCHER knows. The picture in the upper right shows straight lines that approximate 
the edges in an image containing these objects. The lower right picture shows the objects 
SEARCHER found, and their location. 
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Figure 2.4: On the left are two objects that SEARCHER incorrectly found in the image. 
On the right, the edges SEARCHER used to recognize each object. SEARCHER found the 
top-most object due mainly to figure/ground reversals. In both cases, it combines some 
edges that do not seem to come from the same object, and leaves out some edges that do 
seem to come from the same object as some of the edges it has chosen. 

same inaccuracies that SEARCHER does. First of all, many systems use higher thresholds 
in deciding when to accept evidence as indicating the presence of an object. Higher thresh¬ 
olds would reduce the number of false positive identifications that SEARCHER makes, 
but would cause it to miss some objects that GROPER can find. Secondly, as previously 
noted, these systems often make use of a limited amount of grouping, which can improve 
accuracy. Thirdly, many systems assume that a preprocessing step can determine the side 
of an occluding edge on which the occluding object lies. This assumption would reduce the 
number of errors that SEARCHER makes, too. Fourth, many recognition systems operate 
in a domain less likely to produce false positive identifications of objects. If a system looks 
for one object instead of many, there is less chance that an arbitrary collection of edges will 
match an object it knows about. If a scene contains no objects that even slightly resemble 
the object for which the system is looking, it will take more of a coincidence to produce 
a set of edges that together look like that object. And if a system looks for a complex 
object with many edges, it will take more of a coincidence for unrelated image edges to 
line up with that object’s edges than it would if a system looked for a simple object with 
fewer edges. All these reasons make it easy to understand the reliability of existing object 
recognition systems. The fact that SEARCHER looks for a number of simple and quite 
similar objects, with no ability to tell figure from background, increases its failure rate. 

But we can see that even greater problems will emerge when we address domains more 
complex than that in which GROPER operates. When a system knows about more objects, 
which can bend or assume any three dimensional orientation at any distance from the 
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viewer, it becomes much more likely that a random assortment of edges will match some 
known object in some possible pose. Furthermore, w hil e accurate ways exist of telling figure 
from ground in some special, industrial settings, we do not yet have a fully general method 
for performing this task. This growing problem of accuracy provides one of the motivations 
for the development of methods of grouping that axe as general and accurate as possible. 

Effective grouping also reduces the amount of work needed to recognize objects. Clearly, 
the more quickly we find groups of edges that come from a single object, the more quickly 
we can recognize an object. However, the degree to which grouping speeds up recognition 
depends on how a recognition system would otherwise structure the search for objects. 
With the most straight-forward search, using a library of known objects will increase the 
cost of that search linearly with the number of known objects. By making use of indexing 
we can reduce that cost, but problems still arise that grouping can help diminish. 

We might look at recognition as a search for objects through the space of all sets of 
image edges matched to sets of edges from an object model. This space is huge, but of 
course no recognition system needs to explicitly explore it all, since a backtracking search 
eliminates most possibilities without explicitly considering them. However, since the search 
proceeds independently for each object in the library, it increases linearly with the size of 
the library. 

By using indexing we can improve on this linearly rising cost. With indexing, we must 
deal only with the search space of all sets of image edges. For each set of image edges, we 
perform a single indexing step that tells how many sets of model edges might have produced 
this set of image edges. The cost of an indexing step may rise with the number of objects 
in the library, depending on how we perform indexing. A later chapter will discuss this 
issue in more detail. But in general we can keep this cost low, reducing our search to one 
through the space of all sets of image edges. It might seem that with indexing, the cost of 
recognition does not increase at all with the number of objects in the library, except in so 
far as this effects the cost of a single indexing step, because the number of sets of image 
edges does not depend on the number of objects in a library. 

However, if we make use of a backtracking search, this is not true. A backtracking 
search benefits us because it takes advantage of the fact that when we find a group of 
image edges that no set of model edges might have produced, we never need to consider 
any other set of image edges containing this set as a subset. The more objects we have 
in the library, the greater the chances that a randomly chosen set of edges will match the 
edges of some model in the library. This means that we will have to pursue false leads 
longer before we can backtrack. With only one object in the library, we might primarily 
have to consider only sets of two or three image edges. These small sets usually would not 
match the object in the library, unless they were really leading to a correct solution. Our 
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Figure 2.5: A comparison of the number of combinations of edges which SEARCHER and 
GROPER must explore, as well as the relative effectiveness of the two systems, on a set of 
test images. Chapter 9 contains the results of other sets of experiments, as well. 


search then, would be roughly proportional only to the square or cube of the number of 
image edges. With a large library, however, sets of two or three edges will usually match 
some object in the library, causing us to explore the search tree more deeply before we can 
backtrack. This could make our search space proportional to the fourth or fifth power of 
the number of image edges, or larger. A backtracking search using indexing will take an 
amount of time that depends on the chances that a randomly chosen set of image edges 
matches some object in the library of object models. This will depend not only on the size 
of that library, but also on whether it contains flexible objects, and on the types of scenes 
used. But clearly, more complex libraries will result in a much more expensive search for 
objects. 

This discussion indicates that indexing alone will not solve the complexity problems 
produced by large libraries of objects. Grouping can overcome these problems. Table 2.5 
indicates that GROPER explores dramatically fewer sets of image edges than SEARCHER 
in order to recognize objects. GROPER’s grouping component proves effective in ordering 
the search for objects, while SEARCHER can only search at random. This allows GROPER 
to consider less than .25% of the possibilities that SEARCHER does. GROPER performs 
more quickly and accurately than SEARCHER because it more rapidly tries collections of 
edges that were produced by the objects for which it looks. 
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Figure 2.6: Some straight edges that do not resemble anything in particular. 



Figure 2.7: Three of those edges seem likely to come from the same thing. 

2.7 What Can We Expect from Grouping? 

Previous sections have argued for the usefulness of grouping. They have also pointed out 
that even imperfect grouping can lead to an improvement over a recognition system that 
relies solely on search. This section will attempt to provide a rough idea of the expectations 
we should have for a grouping system by looking at the nature of the problem. It will first 
point out that people appear to be good at grouping. It will then note that the nature of 
the grouping problem exposes it to some inherent ambiguities that make it unlikely that 
we can use grouping to completely divide an image into the objects that produced it. 

People seem quite good at decomposing natural scenes into their component objects, 
even when they can not recognize any of those objects. In a scene of unfamiliar objects, a 
person may recognize the green thing in the corner as a separate entity, without knowing 
exactly what it is. People can do this even when an image consists only of straight edges. 
Figure 2.6, for example contains some straight edges. It seems intuitively, at least to the 
author, that the edges highlighted in figure 2.7 all come from a single object. In any event, 
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Figure 2.8: These three edges seem less likely to come from the same thing. 


Figure 2.9: People tend to group together the nearby pairs of edges, not the more distant 
pairs. 

certain possibilities seem naturally more likely then others. One can at least easily imagine 
that the edges highlighted in figure 2.7 come from one object, but one has a harder time 
seeing the edges shown in figure 2.8 as coming from just one object. 

In fact, the gestalt psychologists have noted that people tend to combine edges into 
groups. In figure 2.9 people tend to form pairs out of the edges close together, while 
in figure 2.10 people combine the edges that point at each other, even though they are 
not the closest together. These examples resemble some given by the Gestalt psychologist 
Wertheimer[26]. As a possible explanation for this phenomena the gestaltists suggested that 
people group together edges particularly likely to come from the same object. Given that 
people seem quite good at separating objects in a scene, even before they recognize those 


Figure 2.10: In this figure, people see the edges that point at each other as separate groups. 
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Figure 2.11: Edges like this could easily depict a single object, or two different objects. To 
decide, we must know what we are looking at. 


objects, and that such a separation can make it easier to quickly and accurately recognize 
objects, it seems plausible that people in fact recognize objects by first performing some 
grouping step, and then identifying the resulting groups. Of course that grouping step 
probably does not look exactly like the type of grouping that GROPER does. Chapter 3 
will discuss some additional information that human grouping may well make use of, at the 
least. But even GROPER’s somewhat limited grouping can explain the type of grouping 
people do with figures 2.9 and 2.10, for GROPER picks the same edges that people do as 
the groups of edges most likely to come from a single object. 

The nature of the grouping problem should not, however, lead us to expect a perfect 
answer, which tells with complete accuracy which parts of an image come from a single 
object. This holds particularly for grouping that only makes use of the edges in a scene. 
First of all, we cannot simply make use of connectivity to perform grouping. Edges from 
the same object will fail to connect because part of the object blends into its background, 
or another object occludes it. And connected edges will often come from different objects 
that overlap. Figure 2.11 depicts this inherent ambiguity. An observer cannot tell whether 
the figure contains one object or two or more, without some model of objects expected in 
the image. And since the grouping phase, as we define it, makes no use of knowledge of 
specific objects, grouping alone can not divide such an image completely into its component 
objects. 

So even with grouping, we still expect object recognition to involve some search. A 
recognition system must explore different possible combinations of edges before it can find 
the ones that match known objects. Recognition and grouping should interact. A recog¬ 
nition module can indicate which groups of edges match some known objects and so bear 
further investigation. It can also determine that some groups of edges match no known 
objects, indicating a need to shift attention elsewhere. At the same time, a grouping sys¬ 
tem can greatly reduce the work that the recognition system needs to do, and increase its 


29 









Chapter 3 


A Theory of Grouping 


3.1 Introduction 

To understand how to form groups from an image we first need to understand the nature of 
the problem as well as possible. In Marr’s terms (Marr[20]) we need a computational theory 
of grouping. The previous chapter has described the input and output of the grouping 
process. We now need to understand the regularities in the world that make grouping 
possible. For example, we might feel at an intuitive level that nearby edges in an image 
have a greater likelihood of coming from the same object than edges separated by a larger 
distance. But we must understand why the world would make this true. This understanding 
will then enable us to make this intuition precise, so that we may answer questions such 
as how exactly this likelihood varies with distance. This chapter will examine a simplified 
world model to get a rough idea of what factors we should take into account in performing 
grouping on images produced by the real world. 

Analyzing a simplified world provides only an approximation to a true theory of group¬ 
ing. Ideally we would like to answer questions such as: “in an image of an average scene how 
far apart do edges produced by the same object tend to be?”, and “how does this compare 
to the distance separating edges from different objects?”. But to answer such questions, 
we would need to know what an average scene looks like. This would require a complete 
description of the kinds of objects that exist in the world, as well as an understanding 
of how often each object tends to appear, and in what orientations. We can not hope to 
produce such a complete model. 

Instead this thesis analyzes a model of a much simpler world. Even the simplified world 
discussed in this chapter does not lend itself easily to a complete analysis. However, looking 
at a simplified world develops our intuitions about grouping, leading to hypotheses that we 
can test with GROPER. 
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In addition to the difficulty of modeling the real world, another source of difficulty 
comes from the large number of factors that might influence our assessment of the probable 
correctness of a group. Suppose we consider digitized images. We can characterize a group 
by the pixels (dots in the image) that form it. Each pixel in the group has a location and 
intensity. We wish to find the probability that all the pixels came from the same object in 
the scene. Any change in the intensity or location of any pixel in the group, or any pixel 
not in the group, might cause this probability to change. Looked at in this way, a theory 
of grouping requires us to solve a conditional probability problem with a huge number of 
variables. 

To make this problem tractable, we will make a number of simplifications and idealiza¬ 
tions. For example, we only consider the edges that an edge detector finds in the image. 
Such a decision has advantages and disadvantages. Edges provide a much more compact 
form for the data. We have fewer variables that might effect the probability we want to 
derive. But this simplification comes at the cost of ignoring information that might provide 
important clues to solving the grouping problem. 

By creating an uncomplicated model world, and then making some idealizations the 
following section will reduce the problem of grouping to one of determining the effect of 
two factors on the likelihood that a set of intensity edges in an image all come from the 
same object: the distance between edges and their relative orientation. The sections after 
that will discuss these two factors in detail. 


3.2 Simplifications to the Problem 


This section will boil down the grouping problem to a more manageable form. It will 
do this by considering only an uncomplicated world, and by ignoring many potentially 
useful pieces of information provided by images. The first subsection of this section will 
present a model of this uncomplicated world, while the second subsection will analyze the 
problem mathematically, using some simplifying assumptions to reduce it to a problem of 
understanding the effect of distance and orientation on the likelihood that two groups of 
edges come from the same object. However, even the more limited problems to which this 
gives rise seem still to shed light on the factors of which a more complete theory of grouping 
would need to take account. The end of this chapter will discuss ways to make this easier 
problem more like the problem presented by the real world. 
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3.2.1 A Simplified World 


The theory of grouping given in this paper requires geometric reasoning. To make this 
reasoning easier, it limits the type of objects that may appear in a scene. We assume that 
all objects are flat, lying on a common plane. We also assume that the scene plane maps 
orthogonally onto the image plane without perspective distortion. This will make it much 
easier to determine the effect of assumptions about the world on the images the world 
produces, because the world and the image have such a simple relationship. 

GROPER also only makes use of polygonal models of objects. This does not actually 
simplify geometric reasoning much, but helps mainly to simplify the recognition process 
that chapter 6 will discuss. 

This thesis also assumes that objects do not have too many points of concavity. We 
will formulate this assumption more precisely when we make use of it. Roughly speaking, 
it means that we expect to find objects composed of substantial convex parts. One might 
imagine a hand made up of five completely convex digits, and a convex base. We would not 
allow a hand made up of many tiny line segments that form a jagged surface full of points 
of concavity. 

We will assume that the world contains randomly oriented objects of random size. Each 
object in a scene appears in a random spot that does not depend on the location of any 
other object in the scene. And each object can occur at any size, chosen from a uniform 
random distribution. In the real world, of course, objects abut or rest on each other, and 
do not appear in random locations. Furthermore, to simplify the analysis of occlusions in 
this world, we assume that a section of an object’s perimeter can have at most one other 
object resting on top of it. When we analyze the lengths of occlusions that tend to occur 
in this world, we will not need to consider the possibility that two different objects can 
combine to cover a continuous section of an object’s perimeter. 

We can also make the grouping problem easier by ignoring some information in the 
image. This allows us to focus on just one piece of the problem. 

First of all, this report will consider only a theory of grouping of the occluding edges in 
the image. 

Secondly, we consider only the local rather than the global grouping process. To evaluate 
the likelihood of a group of edges coming from the same object, we really should consider 
two kinds of global factors. First of all, the locations of all the other edges in the scene 
may effect the probable correctness of the group under consideration. If something in the 
group seems to go well with edges not in the group, this might provide evidence against 
a particular grouping. Secondly, we should consider all the relationships between edges 
in the group. And this group could be arbitrarily large, creating a complex network of 
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relationships. 

To avoid these complexities, we will consider only a simpler case. We assume that we 
have two groups of edges, each known to come from a single object. Furthermore, for each 
group we know the order of the edges within the object, so that we know the point at which 
the section of object producing the edges began, and the point at which it ended. We assume 
that for each edge in a group, we know which side is figure and which is background. This 
might seem to contradict GROPER’s assumption that it does not begin with knowledge 
of figure and ground. However, GROPER, in using a theory that assumes this knowledge, 
just allows each edge in the image to create two possibilities, one with the figure on one side 
of the edge, and one with the figure on the other side. Chapter 5 will discuss the grouping 
algorithm that does this in more detail. 

We want to calculate the probability that all the edges in these two groups come from 
a single object, and furthermore, that no intervening edges of the object actually appear 
in the image, for some path between the two groups of edges. This allows us to consider 
a simpler local problem that will provide a first step in solving the more general global 
problem. When we use the word “group” in the future, it will refer to a group of edges like 
this: a group of edges with an order, a beginning and an end. 

Thirdly, we do not consider the shapes of groups of edges, only their relative orientation. 
We can characterize a group of edges by the points where it begins and ends, and the 
direction of the edges at those points. This characterization throws away everything that 
happens in between those two points. Using this representation, we can no longer make 
use of facts such as whether or not two groups of edges contain symmetries. But we have 
simplified the problem so that now we must only consider the effect of a small number of 
variables on the acceptability of a group. We could have used a more complex description of 
a group of edges, or used a simpler one. For example, we might have ignored the direction of 
the edges that start and end a group. The choice made here is based on the hypothesis that 
this description provides the minimal amount of information that still allows us to perform 
effective grouping. Changing the direction of the starting and ending edges affects our 
intuitions about groups, but changing what happens in between these points does not seem 
to have as big an effect, unless such changes introduce symmetry and parallelism. Figure 
3.1 attempts to gives some examples that develop this intuition, although it remains, at 
heart, a guess. 

To summarize, this simplified world has the following characteristics: 

• It contains only edges produced by the perimeters of objects. 

• It contains only two dimensional objects. 
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Figure 3.1: The left and right pictures provide separate examples. Each example contains 
two pairs of groups that have the same characterization, with this characterization depicted 
on the bottom. Although each of the two pairs of groups look quite different, they seem 
to “go together” about equally well. That is, our intuitions about whether they come from 
the same object seem unaffected by the details of the shapes of the edges. 
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• It contains only polygonal objects. It may contain any set of polygonal objects, as long 

as they meet the constraints in this list. 

• Objects do not have too many points of concavity. 

• Objects appear at all different scales, up to some maximum size determined by the size 

of the image. Within this range, the scale of an object occurs according to a uniform 
random distribution on the area of the object. 

• All objects have a uniformly distributed, random location in the scene. Furthermore, all 

objects are equally likely to appear. 

• No section of an object lies on top of more than one other object. 

• Images are formed with orthographic projection. 

Furthermore, this analysis considers only local, rather than global relations between 
groups, using a simplified characterization of each group. 

3.2.2 Breaking the Problem into Two Parts 

This section will perform some simple probability analysis to make clear exactly what we 
need to do to solve the problem that remains. To make the problem easier to handle, we will 
divide it into two parts: the effect of distance and the effect of relative orientation on the 
likelihood of two groups of edges coming from the same object. We will also find that the 
assumptions made above still do not allow us to calculate exact probabilities. So instead 
we will focus on determining the shape of probability distributions. This will allow us to 
compare the relative likelihood of different pairs of groups coming from single objects. We 
will see that we can understand what makes one pair of groups more likely than another 
pair to come from a single object, without determining the absolute probability of either 
pair coming from a single object. 

First of all some notation will make it easier to discuss these concepts. The reader 
should probably glance at this summary of notation, and then refer to it as the terms 
appear in the text. 

0\,C>2'~ We will discuss the problem of evaluating a combination of two groups of edges. 
We call the object that produced the first group 0\, and the object that produced 
the second group 0 2 . 

0\ — 02 : This denotes the fact that the two groups of edges come from the same object. 
0\ ^ 0 2 : This indicates that they come from different objects. 
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Figure 3.2: Two groups. The shaded area indicates their projection. The projection of the 
group on the left goes on infinitely. 

projection Intuitively, projection refers to the area pointed to by a group of edges. More 
precisely, consider the following three half-planes derived from a group. First, the 
line defined by the beginning and end points of the group forms a half-plane that 
does not include the group. Second and third, the lines defined by the first and last 
edges in the group form half-planes that include all the edges of the group. We call 
the intersection of these half-planes the group’s projection. Figure 3.2 provides an 
illustration of projections. 

aj, <Z 2 , 03 , 04 , li, I 2 , d, type : These symbols denote the variables that describe the two groups 
in question, and their relationship. Figure 3.3 depicts these variables. 

luhi Each group has two end points, li indicates the distance along a straight line 
from the beginning point to the end point of group one. I 2 denotes a similar 
distance for the second group. 

04 , 02 , 03 , 04 : We also need two angles to describe each group. One will indicate the 
angle at which the group’s projection spreads out, that is, the angle between its 
first and last edges. The other tells us the angle between the group’s projection 
and the group itself. Oi and 02 will indicate the first of these angles for groups 
one and two respectively. 03 and 04 denote the second angles for groups one and 
two. 

d : This represents the minimum distance between the ending point of one group 
and the beginning point of the other. We arbitrarily define the beginning point 
so that if we follow the group from beginning to end we keep the object that 
produced the group on the right. We do not need to consider the possibility of an 
object with a perimeter that joins the beginning of one group to the beginning 
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Figure 3.3: Seven of the variables that describe the relationship between two groups. An 
eighth variable tells us the groups have a type 2 relationship. 
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Figure 3.4: The three types of orientations that groups may have. The shaded areas indicate 
the projections of the groups. 

of another, because such a connection would cause figure and background to 
reverse. 

typei,type 2 , types: We have used seven variables to capture (almost) seven of the 
nine degrees of freedom that describe two groups and their relationship. The 
remaining two variables describe the relative orientation of the two groups. The 
type of the two groups divides this orientation into three classes. 

types: type has this value when the two groups could come from a single convex 
section of an object. This happens when all the edges in each group fall 
inside the other group’s projection. 

type 2 ’- This value indicates that the two groups could come from adjacent convex 
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Figure 3.5: inti and int 2 , the distance to the intersection of the projections of two groups. 

The shaded area indicates where the projections intersect. 

sections of the same object. This occurs when the two projections intersect. 
type 3 : type has this value when it does not have the value type 1 or type 2 . 

Sometimes we will abbreviate the situation type = type 1 by just saying type\. 
Context should make this clear. Figure 3.4 provides examples of the three types 
of orientations. 

inti, int- 2 ’. These variables apply only if type = type 2 . In that case, the projections of 
the two groups intersect, inti is the minimum distance to this area of intersection 
from the first or last point in the first group. Suppose these two groups really 
do come from adjacent convex sections of the same object. Then inti represents 
the minimum amount of perimeter that is part of the same convex section as 
the edges in the first group, but that did not show up in the image. We define 
int 2 similarly for the second group. Figure 3.5 provides examples of intersection 
distances. 

data, rest of data'. We will abbreviate d,ai,a 2 , 03 ,ai, li, I 2 , type, inti, int 2 with data. 
We will refer to all the data except for some obviously excluded variables with 
rest of data. Context should make clear to which variables rest of data refers. 

We can now state formally what we would like to compute: 


P ( 01 — 0 2 1 d, , a 2 , U 3 , U 4 , type , tnti, znt 2 ) 


That is, we want to find the conditional probability that two groups of edges come from 
the same object, given values for all the above variables. Since these variables do not fully 
specify the relationship between the two groups, this formulation of the problem makes 
a simplification by ignoring some information available in the relative orientations of the 
groups. 
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Using Bayes’ theorem we can transform this problem into an equivalent one: 


P{0\ = 0 2 |data) = 


P(data\Ox — 0 2 ) * P{0\ = 0 2 ) 
P(data) 


Since 


P(data) = P{data\0\ - O 2 ) * P{0\ = 0 2 ) + P{data\0\ ^ 0 2 ) * -P(0i 7 ^ O 2 ) 
we find that: 

1 - 2I a a) - p ^ data | 0j = * p(0i = o 2 ) + P(data|Oi ± 0 2 ) * P(Oj 7 £ 0 2 ) 

P(Oi = 0 2 ) and P(Oi 7 ^ 0 2 ) refer to the probability of two groups of edges coming 
from the same object in the absence of any information about the two groups or their 
relationship. This will depend on the number of objects we expect to find in a scene, and 
on how many groups of edges we expect each object to produce. However, we do not need 
this information to compare the relative merits of two different combinations of groups in 
an image. To see this, notice that: 

1 ( P{data\O x £ 0 2 ) P(0 l ^0 2 ) \ 

P{Oi = 0 2 \data) \Pidata\Ot = 0 3 ) * P{0 X = 0 2 )J 

Suppose we have two sets of data, data and data' , for two different combinations of groups. 
P(0 1 7 ^ O 2 ) and P(Oi = O 2 ) are the same for any combination of groups in an image 
because they do not refer to the parameters that describe those groups. So, if: 

P(data\Oi £ 0 2 ) P{data'\0' x ^ O') 

P{data\0\ — O 2 ) > P(data'\0' 1 = 0 ' 2 ) 

then: 

P{0\ = 0 2 |data) < P{O x = 0' 2 \data') 

So in order to pick the most likely of a number of possible groupings, we need only be able 
to calculate: 

P{data\0\ 7 - 0 3 ) 

P{data\0\ — 0 2 ) 

We now need to determine how to separate data into component parts, in order to 
analyze the effects of individual variables on the total probability we need to compute. 
Notice that: 


P{data\0\ = O 2 ) = P(d\Oi = 0 2 ) * P(rest of data\d,Oi — 0 2 ) 

This isolates the effect of d, but now we must look at P(rest of data\d,Oi = 0 2 ). 

P(rest of data\d, 0\ = 0 2 ) = 
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P(/x,/ 2 ,ai,a 2 ,a 3 ,a 4 !c?, Oi = 0 2 ) * P(type, inti, int 2 \li ^ 2 > ai, a 2 > a 3 , a 4 , d, 0i = 0 2 ) 
We are going to remove the first term in this product by assuming that: 


P{h,h, ®i) «2) 0.3, a^d, Oi = 0 2 ) = P(/i, / 2 , ai, a 2 , 03, a4|0i — 0 2 ) 


and 

P(?li ^2 1 ®1 5 1 ^3? ^4|d, 01 7 ^ 02 ) — P(?l> ? 2 > ®15 ^ 2 > 101 7^ 02 ) 

and 

P(?l} ? 2 i ^1 ? 0 2 , O 3 , O 4 |01 — 0 2 ) — -P(?l, / 2 5 Uli U 2 , 03 , U4|0x 7 ^ 02 ) 

These assumptions are not generally true, and so involve a further simplification of the 
problem. 

What do these assumptions mean? The first two state that the variables that describe 
the shape of two group are independent of the distance between them. If the groups come 
from different objects, as in the second assumption, then this seems to follow from the 
earlier assumption that our world contains only randomly oriented objects. However, if 0\ 
and 0 2 overlap, then these variables may not be independent. A single cause, the overlap, 
may help determine the way both groups begin and end. If the groups come from the same 
object (assumption one) this implies that the way that two groups of edges in the same 
object start and finish does not depend on the distance between them. This may not be 
true in general if, for example, objects have patterns that repeat in only spatially localized 
areas. The third assumption implies that the way we expect groups to start and end does 
not depend on whether or not they come from the same object. This may not be true in 
general for similar reasons. Intuitively, these assumptions mean that we are only dealing 
with a world that contains random, unstructured objects. By random I mean objects for 
which knowing about one part of the object does not tell us about other parts of the object. 

Why make these particular assumptions, given that the world may violate them? These 
assumptions arise from the intuition that while the shape of a group may tell us a good 
deal about the shapes of nearby groups in the same object, the way a group begins and 
ends will not tell us much about the way nearby groups begin and end. We should deal 
with this type of information only when we have an approach to grouping that tries to take 
into account the complete shapes of different groups of edges. We make a tactical decision 
to ignore this kind of information in the hope that we will not go too far wrong, and that 
we will now be able to focus on more crucial factors. 

Given the three assumptions above, and the equation that stated: 

1 = - ( Pjdata^ £ 0 2 ) * P(Oi tO*) ) 

P(0i = 0 2 ) \P(data\Oi = 0 2 ) * jP(0i = 0 2 )J 
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we can see that P(li, l 2 ,ai,a. 2 ,a 3 ,ai\d,Oi = O 2 ) and P(li, h-, Oi? 0 , 2 , 03 » a.i\d, 0\ 0 2 ) 

cancel each other. That is: 

1 , P{d\0\ O 2 ) . * , P(type,inti,int 2 \rest of data,Oi 0 2 ) 

P(Oi — O 2 ) P(d\Oi = O 2 ) P(type, inti, int 2 \rest of data, Oi — O 2 ) 

The next two sections will analyze the distance part, and the orientation part of this 
equation. 

For convenience, we will list the assumptions relied on in this section that we did not 
set out in the previous section: 

• We only attempt to discover the relative merits of different combinations of groups, 
not the absolute probability of a group’s correctness. 

• We ignore any information about a group’s orientation not captured in the type, inti , 
or int 2 variables. 

• We ignore the dependence between the angles and lengths that describe two groups 
and the distance separating the groups. 

• We do not analyze the dependence between the way groups start and end, and whether 
they come from the same object. 

3.3 The Effect of Distance 

This section will provide estimates of the distributions of P(d\Oi = O 2 ) and P(dfOi ^ O 2 ). 
It will argue that if groups come from the same object, we expect shorter rather than longer 
distances to separate them. If they come from different objects, we expect the reverse. 

3.3.1 When Groups Come from the Same Object 

To find P{d\Oi = O 2 ) we will ask what causes a section of an object’s perimeter to fail to 
produce an edge in the image. This is not the only cause of some distance separating two 
groups. There may be another group of image edges that accounts for some of the object 
perimeter connecting the two groups. However, we focus on the possibility that two groups 
not only come from the same object, but come from nearby parts of that object because 
we expect such situations to produce the best groups, which we will want to use first for 
recognition. We assume that if two groups came from widely separated parts of the same 
object, the data will prove a less reliable indicator of whether the two groups came from 
the same object. 
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Two different causes can prevent a section of an object’s perimeter from creating an 
edge in an image: occlusion and blending into the background. The following paragraphs 
will argue that these methods tend to create shorter, rather than longer sections of missing 
perimeter. 

Occlusion 

We begin by discussing the distribution of the lengths of occluded sections of objects. That 
is, suppose, given the simplified world we are analyzing, we generate a random occlusion 
between two objects. We would like to know the probability distribution of the distance 
separating the beginning and the end of the occluded section of perimeter. This will tell 
us the distribution of some of the distances that separate different groups that come from 
the same object. We will proceed by first pointing out a correspondence between the 
distribution of lengths of occluded object, and the distribution of distances between points 
within objects. We will first cover a more limited domain, that of completely convex objects, 
and then discuss how to extend the result to general polygonal objects with some points 
of concavity. Finally, we will discuss the bearing this result has on the other ways that 
sections of perimeter fail to produce image edges. 

In a two dimensional world, occlusion occurs when one object lies on top of another, 
blocking part of its perimeter. When one object blocks a contiguous section of another’s 
perimeter it means that the two objects’ perimeters overlap at two points, at the beginning 
and end of this missing section of perimeter. Furthermore, the same distance separates the 
two points on each object. 

It is difficult to determine the likelihood that two randomly oriented objects will produce 
an occlusion with a length in a given range. But we will argue that this distribution is related 
to the distribution of the distance separating two randomly chosen points on the perimeter 
of each object. 

To find the real distribution, we could divide each object’s perimeter into short sections. 
Call these sections (si, s 2 , ...s n ) and {h,t 2 , To find the probability that an occlusion 

of length d to d + 8 will occur, we can just sum over all i,j,k , l the probability that an 
occlusion of that length could arise with sections s t - and Sj overlapping tk and £/. This 
probability will depend on how much of s, is the appropriate distance from section Sj , and 
how much of tk is the right distance from £/. So, as we make the sections arbitrarily small, 
the probability distribution will depend on how many pairs of tiny sections on the first 
object’s perimeter are between d and d + 6 apart, and how many on the second object are 
that distance apart. We can find this out by examining the probability distribution of the 
distance separating two randomly chosen points from each object. 
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Figure 3.6: Each side of the above illustration shows an occlusion. Each polygon has two 
sections of its perimeter marked, and each section of perimeter overlaps a section from 
another object. The objects on the left must align perfectly for an occlusion to occur, while 
the objects on the right have some leeway in their orientation. 

However, the likelihood of two pairs of sections of perimeter, one from each object, 
producing an occlusion, does not just depend on the distance between the sections, it 
also depends on the angle between them. Figure 3.6 shows two examples. On the left, 
we see two objects with two sections of perimeter marked on each object. Because the 
sections of perimeter are parallel in each object, there is a measure zero probability that 
an occlusion will cause these four sections of perimeter to overlap. On the right, we see 
an example of four sections of perimeter that really may produce an occlusion. We will 
ignore this complication, however, and assume that we may reasonably approximate the 
distribution of occlusions produced by two objects by assuming that a one-to-one mapping 
exists between each occlusion of distance d and each pair of points on each object separated 
by the distance d. We can find this distribution by multiplying the distribution of the 
distance that separates two randomly chosen points from the first object by a si mil ar 
distribution for the second object, and then normalizing the resulting distribution so that 
it has an integral of one. 

We will analyze the probability distribution of the distance between randomly chosen 
points on a single object in order to show that short occlusions will occur more often than 
long ones. This probability distribution will vary from object to object. So we would like 
to get an idea of the worst possible distribution for proving our hypothesis. The object 
that produces the longest occlusions will be the object with the longest distances separating 
randomly chosen points; we want to analyze that object. 
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Figure 3.7: Three randomly chosen, convex polygons. 

Since we assume that any object appears at randomly chosen scales, we do not care 
about the absolute distance between two points on an object; that distance will depend 
on the scale we choose in any event. So, in comparing different objects, we normalize this 
distance based on the size of the object. To do this we take each object at a scale for which 
a distance of one unit separates the two points farthest apart. 

In this report we conjecture that using this standard of comparison, a circle will provide 
the worst possible case for our hypothesis. That is, for a circle, the distance between 
randomly chosen points, divided by the maximum possible distance between two points, 
will tend to be highest. Intuitively this makes some sense because, for a given perimeter, 
a circle minimizes diameter while placing each point as far as possible from the others. 
Figures 3.7 and 3.8 show some support for this. Figure 3.7 shows some randomly chosen 
convex polygons. Appendix A explains how we construct them. Figure 3.8 shows that pairs 
of randomly chosen points tend to be much further apart for the circle than for any of the 
polygons. 

We claim that a circle provides the worst case distance distribution in the following 
sense: that for any distance, d, the probability of randomly choosing two points on the 
circle less than a distance d apart will be less than it will for any other convex object, 
providing we normalize each object based on the maximum distance separating any two 
points. This thesis does not offer a proof of this conjecture, but experiments do support 
it. Of 400 randomly chosen convex objects, all produce probability distributions for which 
this conjecture holds. 

So suppose circles do offer a worst case for us to consider. This means that a universe 
of circles will tend to produce longer occlusions than any other universe of objects. But 
we still assume that circles appear at all possible scales. When we average the circle’s 
distribution over all scales we get a monotonically decreasing distribution. This assumes 
we weight the averages of the distributions with the perimeter of the circle, as circles with 
longer perimeters can produce more possible occlusions. This averaged distribution means 
that if we randomly select two points from a random sized circle, we are more likely to get 
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Figure 3.8: The distribution of a variable obtained by randomly selecting two points from 
the perimeter of a shape, and measuring the distance that separates them. We normalized 
the circle and the polygons so they have the same diameter. 

points separated by shorter distances than longer ones. 

Once we have the distribution of distances between pairs of points for a particular library 
of objects, we can determine the distribution of occlusions for that library by taking the 
normalized square of the initial distribution. The distribution of lengths of occlusion is just 
the distribution of the distance separating two randomly chosen pairs of points from the 
same object, when the same distance separates those points. This distribution corresponds 
to the distribution achieved through the following process: randomly select two points from 
a randomly chosen object; then randomly select two more points from another randomly 
chosen object; if the difference in the two distances between the two pairs is less than fid, 
we have a random variable; otherwise, start over. As fid goes to zero we get the distribution 
of lengths of occlusions. Clearly, for any distance d, the chances of choosing two sets of 
points with distances in the range (d - fid, d + fid) is just the square of the chances of 
picking one pair of points separated by a distance in that range. So the distribution of the 
distances separating two pairs of points is just the square of the distribution for single pairs 
of points randomly chosen from our library. Of course, after squaring the distribution, we 
must normalize it so that it has an integral of 1. As we mentioned, this analysis assumes 
that each occlusion corresponds to a pair of two points on each object separated by an 
equal distance. This ignores the additional effect of the two angles between the points. 

We now have a worst case distribution for the lengths of occlusions produced by any 
library of convex objects: the distribution produced by a library containing just a circle. 
Figure 3.9 shows this distribution, as well as the distribution of occlusions we expect when 
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Figure 3.9: The square of the distribution obtained by taking a shape at a random scale, 
and measuring the distance between randomly chosen points. The distribution is calculated 
analytically, not experimentally. 



Figure 3.10: Not every set of two pairs of points on two concave objects corresponds to a 
single occlusion. 

our library contains each of the three randomly chosen polygons. 

If objects have concavities we no longer have a one-to-one mapping between occlusions 
and pairs of pairs of points separated by equal distances. With concave objects, in between 
the points where the two objects overlap, one object’s perimeter may dip back inside the 
others (see figure 3.10). We will not discuss this issue much, but will just point out one 
reason why we might expect objects with concavities to produce even shorter occlusions. 

Consider the case where we have two lengths of perimeter, Pi and P 2 . P\ begins at 
point pn and ends at p\ 2 . P 2 begins and ends at pn and P 22 * First of all, it seems obvious 
that the longer a section of perimeter, the more likely it is to contain a concavity, because 
we only consider objects with relatively few points of concavity. And unless either P\ or 
P 2 contain a concavity, then the set of points pu, pi 2 , p 2 i> and P 22 will correspond to an 
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Figure 3.11: The lengths of occlusions that occur from randomly intersecting randomly 
chosen objects. 

occlusion. If they do contain a concavity these points may or may not correspond to an 
occlusion. Provided we assume that the longer the distance between the end points of a 
section of perimeter, the longer the section of perimeter is likely to be, the likelihood of a 
concavity falling inside a section of perimeter then tends to favor short rather than long 
occlusions. 

The other question we would have to consider for full analysis of this problem is whether 
longer rather than shorter sections of perimeter that do contain concavities have a greater 
chance of arcing back inside each others boundaries like the one in figure 3.10. 

We can test these theoretical arguments empirically. To do so, we have constructed 
random objects, as described in Appendix A, and used them to form random occlusions. 
We fabricated convex objects, objects with two convex parts, with four convex parts, and 
with six convex parts. Figure 3.7 shows examples of random convex objects. Figure 3.11 
shows the distribution of lengths of occlusion that these objects produce. We find that long 
occlusions do occur more rarely than short ones. 

This section has not presented a proof that short occlusions occur more often than long 
ones. Rather, an analysis of the problem has suggested a reasonable conjecture that we can 
support with empirical evidence. 

Blending into the Background 

A section of an object’s perimeter may also fail to create an edge when it blends into its 
background, that is, when the part of the scene behind the object has the same reflectance 
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as the object itself. This section will discuss this process, pointing out that it bears a 
close resemblance to the process of occlusion. First, we will make use of a simple model 
of objects’ reflectances. For this model, the length of missing perimeter from blending into 
the background will have the same distribution as the missing lengths due to occlusions. 
We will then point out that a more realistic model of reflectances will tend to produce even 
shorter sections of missing perimeter. 

First let us assume a quite simple model of object reflectances, which has three proper¬ 
ties. One, the scene has a single background plane that has a uniform reflectance. Further¬ 
more, no object in the scene has the same reflectance as this background plane. Think of a 
scene in which many dark objects lie on a white table. Second, every object has a uniform 
level of reflectance; the intensity an object produces in an image will not vary throughout 
that object. Third, each part of the scene has the same level of illumination. These three 
assumptions imply that an object blends into its background only when it rests on top of 
another object with the same reflectance. No object blends into the scene’s background 
plane. 

In such a world we find a correspondence between lengths of perimeter missing from 
blending into the background and missing from occlusion. Whenever one object lies on 
top of another, occluding it, if the two objects have the same reflectance the object on top 
will blend into the object on the bottom for exactly the length of the occlusion. Not every 
occlusion also causes blending in, the two objects may have different reflectances. But since 
the scene contains only randomly oriented objects, we may assume that the reflectance of 
two objects bears no relation to the length of the occlusion they produce. So, some random 
subset of the occlusions in a scene will also produce missing lengths of perimeter due to the 
object on top blending into the object on the bottom. And this missing perimeter will have 
the same distance between its start and end as the occlusion did. Therefore, the distance 
between the start and end of sections of an object that blend into the background will have 
the same distribution as the distance produced by occlusions. 

We simplified the previous section by not allowing two occlusions to join together to 
form one long occlusion. This same restriction prevents two separate events of blending 
into the background to join together. 

Now we will briefly discuss what happens with a more realistic model of the world. First 
of all, in a more realistic model, objects may occasionally have the same reflectance as the 
background plane. However, we can still assume that this happens infrequently, and plays 
only a minimal role in the total effect that we are studying. Secondly, in the new model the 
reflectance of an object may vary. These variations in reflectance will produce even shorter 
sections of missing perimeter than the simpler model. 

Even in this more complicated model, each section of missing perimeter will usually 
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stem from one object lying on top of another object with the same reflectance. But now, 
the section of missing perimeter may be shorter than the occlusion. Where one object lies 
on top of another, the two objects may have the same reflectance for some short space 
of time. But the reflectances of the two objects will change in independent ways. This 
means that the two objects may have the same reflectance for some space, but at any 
point there is a chance that the reflectance of one of the objects will change. Only through 
an unlikely coincidence will the reflectance of the other object change in the same way. 
So, at worst, the two objects might have a constant reflectance throughout the length of 
the occlusion, creating a missing section of perimeter equal in length to the length of the 
occlusion. This more complex model alters things only by allowing one of the object’s 
reflectances to change somewhere in the middle of the overlap, which will usually result in 
a shorter section of missing perimeter than that produced by the occlusion. Therefore the 
distribution of sections of perimeter missing from blending into the background will tend 
to produce even shorter sections of missing perimeter than will occlusions. 

When Nothing Gets in the Way 

The previous two sections discussed the likelihood of events occurring that will prevent part 
of an object’s perimeter from making an edge in the image. Neither discussion indicates 
the probability that a distance of 0 will separate two groups of edges, because nothing has 
caused their perimeter to fail to appear. Fortunately, since we only need to calculate: 

P{d\Ox = 0 2 ) 

P(d\Ox £ 0 2 ) 

we do not need to determine either the numerator or denominator separately. The forth¬ 
coming discussion will show on what this ratio depends. 

3.3.2 When Groups Come from Different Objects 

This section will analyze the problem of determining the chances of a given distance sepa¬ 
rating edges that come from different objects. We will do this in stages. First we will do 
it for a somewhat simplified version of the problem. We will then point out flaws in this 
analysis, and get a more accurate approximation of the answer. 

A Simple Analysis 

Each group of edges has a beginning and an end. If the sections of objects these groups 
come from are randomly oriented in the image, we can assume (falsely as we will see) that 
the beginning and end of each group has a random location. The distance between the 
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Figure 3.12: The distribution of distances between randomly oriented groups. This was 
obtained by randomly orienting two line segments in a square image, and finding the min¬ 
imum distance from the start of one to the end of the other. The likelihood rises steadily, 
and then begins to drop as the size of the image begins to restrict the distance that can 
separate the two edges. 

beginning and end of a group will of course constrain their location. So this situation is 
just like randomly placing two line segments in the image. 

We can now find the distance between the two groups, as we have defined this distance, 
by taking the minimum of the distance between the beginning of each group, and the ending 
of the other. Randomly locating two line segments is like randomly locating the beginning 
of one and the ending of another, and then randomly locating the remaining two points, 
subject to the constraint provided by the known lengths of the segments. If we call the 
beginning of the two groups pn and P 21 , and the ending points pi 2 and p 22 , we find we 
need to compute the distribution of: 

-POIPIHP22II - do) * F > (||P21,P12|I >= do) -I- -P(||Pl2»P2lll = do) * F > (||Pll,P22ll > do) 

Figure 3.12 shows the distribution this process produces for several sample values of 
lengths separating the beginning and ending of the two groups. Analytically we can see 
that for small groups the probability will increase linearly with the distance between the 
two groups. We can see this because the points in the plane within a given distance away 
from a point, fall within a circle. The area of this circle increases with the square of its 
diameter. The derivative of this, i.e. the rate of change of the area, increases linearly with 
the distance. 
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A Refined Analysis 

The above analysis made the unwarranted assumption that the position of the beginning 
and end points of one group of edges is independent of the beginning and end points of the 
other group. But suppose that the two objects overlap, covering up part of one of these two 
groups of edges. One of the groups of edges might have to end at the point of the occlusion. 
Let us consider a simple case. If the two objects have the same reflectance, the perimeter 
of the object on top will not create any edges where it lies over the bottom object. The 
two sections of perimeter will co-terminate, creating a single curve, with a concavity where 
they meet. (Two objects form a concavity at a point of occlusion except in the case of an 
extremely unlik ely alignment of the objects). A distance of 0 will separate the two groups 
of convex edges formed in the image. To create a more realistic analysis, we must consider 
what happens if the two groups of edges had their end points determined by an occlusion 
involving both of them. 

Two factors will influence the impact of such an occlusion. First of all, we must consider 
the probability of such an occlusion occurring. Then we must consider the possible distances 
between the groups when such an occlusion has occurred. 

If no such occlusion occurs, we can use our previously developed analysis. So, we want 
to combine that analysis with an analysis of what happens when an occlusion occurs, based 
on the probability of an occlusion. 

P(d\Ot £ 0 2 ) 

= P(d\Oi £ 0 2 , groups do not overlap) * P(groups do not overlap\0\ 0 2 ) 
+P{d\0\ ^ 0 2 , groups overlap) * P(groups overlap\0\ £ 0 2 ) 

We do not develop the probability of such an occlusion occurring. 

In the case where the two objects have the same reflectance, an occlusion of the two 
groups has a simple interpretation. One of the groups ends where the other begins. Thus, 
distance equals zero, unless something else occludes this point. This means that in the 
case of an occlusion, we get essentially the same distribution of distances that we get with 
two groups from adjacent convex sections of the same object. If nothing occludes this 
connection, we get a distance of zero between the two groups. If something does occlude 
them, the distance between the two groups equals the length of the occlusion. This also 
provides an answer to a question raised earlier. 

P(d = O|0i = 0 2 ) 

P(d = 010^02) 

is just one divided by the probability of two groups of edges from different objects overlap¬ 
ping. 
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So, the distribution of distances produced by groups coming from different objects 
equals a linear combination of two curves. The first curve also describes the distribution of 
distances when the groups come from the same object. This curve applies when the groups 
overlap. The second curve describes the distribution of the minimum distance between the 
end points of two randomly oriented line segments. The way we combine these two curves 
depends on the chances of two groups from different objects occluding each other. 

A Refined Refined Analysis 

Unfortunately, this analysis is not quite right either. We did not consider all the possibilities 
when we said that either the two groups have independently located end points, or else 
they intersect. For example, if the two groups fall close to each other, a third object might 
occlude them both. A complete analysis would require considering factors that might 
affect the location of the end points of the groups in a non-independent way, even when the 
groups do not intersect. We will not consider these cases, however. We assume that the 
distribution of distances separating two groups in this situation will not differ much from 
the distribution when the groups have independent locations. 

3.3.3 Summary 

Although incomplete, our analysis suggests that the likelihood of a distance separating two 
groups that come from the same object will tend to decrease with the distance, with the 
distribution a circle produces as the worst case. The distribution when the groups come 
from different objects will approximate a linear combination of this curve and another that 
increases linearly with the distance. For larger distances, the second curve will dominate. 
For smaller distances, the first curve will. So for smaller distances, will become 

the inverse of the probability that two groups of edges from different objects intersect. For 
larger distances the value of this ratio will decrease rapidly. 


3.4 The Effect of Orientation 

This section will discuss the impact of different factors on the relative likelihood of dif¬ 
ferent types of orientation occurring. Dividing the possible relative orientations of groups 
into three types provides only a coarse description of them. This description proves useful 
nonetheless because types corresponds to something real about objects. The type classifi¬ 
cation of two groups tells us whether they could have come from the same convex section of 
an object, from adjacent convex sections, or only from non-adjacent sections. This makes 
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it easier to calculate the probability of different types occurring if the groups come from 
the same object. 

This section will leave some questions unanswered about the exact probability of certain 
events occurring. It will try instead to understand the general shape of the distributions in 
question, as well as which variables in the data will influence this shape, and which variables 
we can hope to ignore. 

3.4.1 When Groups Come from the Same Object 

This section will consider the probability of each of the three possible types occurring when 
two groups come from a single object. Recall that the type of orientation of two groups 
indicates whether they could have come from the same, adjacent, or only non-adjacent 
convex parts of an object. The discussion of all of the types will follow similar lines. For 
each type, we will divide the probability into three parts. What if the two groups actually 
come from the same convex section of an object? What if they come from adjacent sections? 
What if they do not? This will lead to questions like, what is the probability that groups 
coming from adjacent convex sections of an object will appear to have come from adjacent 
convex sections? We will need to discuss the following nine problems (abbreviating “groups 
actually come from same section” with “same”, “groups come from adjacent sections” with 
“ad;”, “groups come from non-adjacent sections” with “ notadj ”): 

P(typei\same, rest of data, 0\ = 0 2 ) 

P(typei\adj, rest of data,0\ = 0 2 ) 

P(typei\notadj,rest of data,0\ — 0 2 ) 

P(type 2 \same, rest of data, 0\ = O 2 ) 

P(type 2 \adj,rest of data, Oi = O 2 ) 

P(type 2 \notadj,rest of data,0\ = O 2 ) 

P{typez\same, rest of data , 0\ — O 2 ) 

P{typez\adj,rest of data, 0\ — O 2 ) 

P(type 3 \notadj,rest of data, 0\ = O 2 ) 

We choose to decompose the problem in this way because many of these probabilities 
have simple answers. For example, P{type3\adj,data,0\ - O 2 ) equals 0, since if two groups 
come from adjacent convex sections of an object, they can not look as if they could not 
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possibly come from the same or adjacent convex sections. Two other probabilities also 
equal zero, while P{type\ | same, data, 0\ = 0 2 ) equals one. 

The probabilities discussed here will all depend on the probabilities of the sections of 
objects from which the two groups come having various kinds of relationships. For example, 
to calculate some probabilities, we would need to know the chances of two randomly chosen 
groups of edges in the image coming from the same convex section of an object, based on 
some of the data and on previous expectations. We see this from the equations : 

P(typei\rest of data, 0\ =0 2 ) 

= P(typei, same\rest of data, 0\ = 0 2 ) + P(typei,adj\rest of data, 0\ = 0 2 ) 
+P(type\, notadj\rest of data,0\ = 0 2 ) 

= P(typei\same, rest of data, 0\ = O 2 ) * P(same\rest of data, 0\ = 02)etc.... 

So dividing the problem into the nine subproblems listed above only makes sense if a 
real probability exists that groups come from the same or adjacent sections of an object. To 
argue that this classification will be productive we need an earlier assumption that objects 
contain relatively few convex parts. This means that two nearby sections of edges in an 
object have a reasonable chance of coming from the same or adjacent convex parts. 

As the above equation shows, to calculate these nine probabilities exactly, we would 
need to know things like the P{same\rest of data,0\ = O 2 ), where rest of data does 
not include type information. This will depend on a variety of factors such as the number 
of convex sections in each object. We might calculate these probabilities for a specific 
application, for which we know which objects will appear in scenes. We can not calculate 
them in general, however, and so this thesis will not discuss their effect. 

The Probability of type\ Occurring 

We would like to determine the probability of two groups having a type 1 relationship given 
that they come from the same object, and also given all the other variables that describe the 
groups. We can do this by looking at the probability depending on whether they actually 
come from the same convex section, adjacent ones, or non-adjacent ones. So we find: 

P{type\\rest of data, 0\ = 0 2 ) 

= P(typei\rest of data, 0\ = 0 2 , same ) * P(same\rest of data, 0\ = 0 2 ) 
+P(typei\rest of data, 0\ — 02,adj) * P(adj\rest of data,0\ = 0 2 ) 

+P(typei|rest of data,0\ = 0 2 , notadj) * P(notadj\rest of data,0\ = O 2 ) 
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P(type\\rest of data, 01 = 02, same) = 1. We will make some general comments on 
the remaining parts of the above equation. 

Suppose first that we have two groups of edges that come from adjacent convex sections 
of the same object. This greatly restricts the possible orientations the groups can have. In 
particular, they cannot have a type 3 relationship. This leads us to expect that both type i 
and type 2 relationships will be more likely. 

An example may make this point more clear. Consider two groups, each consisting of a 
single short edge. The projection of each group consists of half the plane. If we randomly 
orient the two edges, there will be about a j chance that they fall in type\, a | that they 
will fall into type 2 , and a | chance they will fall into type 3. If the groups can not fall in 
type^, but have an otherwise random orientation, they will now have a | chance of falling 
in type 1. So, if the only effect of two groups coming from adjacent convex sections were to 
preclude the possibility that they fall into types, then we could conclude that this increases 
the probability that they fall into type 1 or type<i, in a way we can quantify by studying the 
probability of randomly oriented groups falling into various types. 

The reasoning above, then, makes the simplifying assumption that groups from different 
convex sections are randomly oriented, except with the constraint that adjacency may rule 
out certain orientations. Using this assumption, we find that: 

P(typei\adj,0\ = 0 2 , rest of data ) 

P{typei\0\ O 2 , rest of data) 

P{typei\0\ ^ O 2 , rest of data) + P(type2\0i £ O 2 , rest of data) 

In the section on groups that come from different objects, we will see how to calculate these 
probabilities. Extending this assumption to groups that come from non-adjacent sections 
of the same object, we see that it implies that these groups have random orientations with 
regard to each other, and 

P{typei\notadj,0\ = 02,rest of data) = P{typei\0\ £ 02,rest of data) 

This assumption about orientations, however, probably does not accurately describe the 
world. Consider, for example, groups formed by taking the convex sections of a star. All 
pairs of such groups fall into type 1 of type 2 , even the non-adjacent ones. More generally, 
it seems that because two groups from the same object must link up eventually, they are 
more likely to point in the same direction. This intuition leads to the conclusion that while 
the above assumption may approximate the right answer, it fails by underestimating the 
probability that groups of objects will fall into a lower type. It underestimates, for example, 
the probability that two groups from non-adjacent sections will fall into type\. 
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The Probability of type 2 or type 3 Occurring 

The analysis of types 2 or 3 occurring closely resembles the previous analysis. We want to 
calculate: 

P(type 2 \rest of data, 01 = 02) 

= P{type 2 \adj,rest of data, 01 - 02) * P(adj\rest of data, 01 = 02) 


+P(type 2 \notadj,rest of data,01 = 02) * P(notadj\re$t of data,01 = 02) 

We know that type 2 can not occur if the groups come from the same convex section of the 
object. 

We have already discussed the problem of deriving P(adj\rest of data, 01 — 02) and 
P(notadj\rest of data,0 1 = 02), explaining why we will avoid it. 

If the two groups of edges come from adjacent sections of the object, they must fall into 
type\ or type 2 . We have already discussed the probability that they fall into type\, so the 
probability they fall into type 2 is just one minus this probability. 

And we have previously discussed the probability that the relationship between two 
groups falls into type 2 given that they come from non-adjacent sections of an object. That 
is, we will assume the two groups have random orientations, and that: 

P(type 2 \notadj,Oi = 0 2 ,rest of data) = P(type 2 \0i 7 ^ 0 2 ,rest of data) 

Similarly, we find that: 


P(type 3 \rest of data, Oi = 02) 

= Pftype^notadj, rest of data, 01 — 02) * P(notadj\rest of data, 01 = 02) 

— Pftype^lrest of data, 01 ^ 02) * P(notadj\rest of data, 01 — 02) 

3.4.2 When Groups Come from Different Objects 

Suppose we assume that two groups that come from different objects have random ori¬ 
entations. Then calculating the probability of those groups falling into a particular type 
becomes a straightforward, although not trivial, problem in geometry. As we saw in the 
discussion of the role of distance in grouping, this idealization of random orientations does 
not completely hold. However, we will proceed under the assumption that it holds most of 
the time, and that when it does not hold, its failure does not affect the outcome much. 
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Recall from our initial analysis of the problem that we want to calculate: 

P(type x \rest of data,01 02) 

We can introduce these problems with a simplified analysis. We will calculate the relevant 
probabilities when a distance of 0 separates the beginning of each group from its end. This 
reflects the situation when groups are small relative to the distance that separates them. 
We will then discuss how to extend these results to the more general case. 

type i: the Simplified Problem 

This problem becomes conceptually simple. We now have two points, each projecting a cone 
across the plane. Or the projection may be just a point, if a group has a finite projection. 
The probability that another small group, randomly located in the image, will fall into the 
projection of a group just equals the percentage of the image that this projection covers. 

Recall that ai represented the angle between the two vectors that form the projection 
of group 1. So, if group 1 is near the center of the image, the chances that the second 
group falls inside this projection are |£. Similarly, the chances of the first group falling in 
the second’s projection are |£. Since the two groups have independent orientations, the 
chances of them having a type\ relationship are a * *f 2 . 

type 2 and type 3 , the Simplified Problem 

It turns out that we can express the probability that the projections of two groups do not 
intersect with the formula: 

¥-°i 

2 7T 

So the probability that the projections of the two groups do intersect is just 1 minus this 
value combined with the probability of type i occurring. Appendix B contains a proof of 
this. 

The Real Problem 

We now want to consider the situation in which the group has some size. This thesis will 
not provide complete solutions to this problem. However, the next sections will discuss how 
we expect those answers to differ from the simplified ones derived above. 

typei, the Real Problem 

When groups may also have a length (expressed by the variables l\ and 1 2 ), this affects 
the probability of one group falling in another’s projection in a number of ways. In the 
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simplified problem, if group 1 fell into group 2 ’s projection, this did not affect the probability 
of group 2 falling into group l’s projection. We will not be able to count on this in the real 
problem. In fact, every orientation of group 1 will not only determine if group 2 falls in its 
projection, but will also determine the likelihood that it will fall in group l’s projection. 

This section will make the point that the greater the projections of the two groups, the 
greater the chances that they will have a type i relationship. On the other hand, the larger 
the groups, the less the chances that they will have this relationship. Furthermore, this 
section will suggest some bounds on the probability of type\ occurring, in the case where 
both groups have infinite projections. 

Both the starting and ending point of group 1 must fall in group 2’s projection for them 
to have a type i relationship. Instead of overlapping just one point, group 2’s projection 
must now sweep over a whole slice of the plane. We will look at the impact of this effect 
when group 2 still has a length of 0 , and then when it does not. 

If group 2 began and ended at the same point, then the lines from this point to where 
group 1 began and ended would form an angle, call it a. a would have a maximum value 
when group 1 faced group 2 directly. At that point, a would equal 2 arcsin|^. As group 1 
tilted with respect to group 2, a would diminish to 0. The probability of groupl falling in 
group 2 ’s projection would be a2 2 ~ a , or 0 if a > a. 2 . 

When group 2 also has a length, however, we must consider the fact that at distance 
d from group 2, its projection sweeps out a larger area. For greater accuracy, we should 
consider not the angle of a group’s projection but its total size. We ignore this subtlety, 
however. 

We must also contend, however, with the complexity that the chances of group 1 falling 
in group 2 ’s projection depends not just on group 2 ’s orientation, but also on group l’s. 
This means that the probability of both groups falling inside each other’s projections is 
not simply the product of two simple probabilities. If group 1 falls in group 2’s projection, 
where in the projection it falls will determine how oblique an aspect group 2 presents to it. 
Furthermore, we must also now consider <13 and 04 , the angle between the projection of a 
group and a line joining its beginning and end. For some values of 03 , for example, the fact 
that group 2 falls in group l’s projection may imply that group 1 presents only a narrow 
view of itself to group 2. These dependencies make a precise calculation difficult. 

However, we can calculate bounds on this probability. Based on the above analysis, we 
see that the best chance group 1 has of fallin g in group 2 ’s projection occurs when it appears 
to group 2 head on, as a point. The worst chance occurs when it faces group 2 directly, 
presenting an aspect of length lx. This suggests that the probability of the groups having 
a type\ relationship has the simplified analysis of the last section as an upper bound, and 
has as a lower bound the product of the two minimum probabilities treated independently. 
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This would tell us that: 


and 


P(type = type\\rest of data) < 


a\ * a2 
47r 2 


P(type = typei\rest of data) > 


a\ - (2 arcsin 02 - (2arcsin 


2tt 


27T 


type 2 and types, the Real Problem 

We do not analyze this more complicated problem. We just point out that larger groups 
will have larger projections, with an increased chance of intersecting. 


3.5 The Distance to Intersections 

This section will argue, somewhat superficially, that when two groups have a type 2 relation¬ 
ship, the greater the distances to the place where their projections intersect, the less the 
chances that the groups come from adjacent sections of the same object. These distances, 
denoted by inti and int 2 , indicate how long the two convex sections that produced these 
groups must be in order to intersect. The longer these sections must be, the larger the scale 
of the object that produced them. This essentially restricts the possible ways that a single 
object could have produced the two groups. 

inti denotes the minimum distance from the beginning or end of group 1 to the pro¬ 
jection of group 2. The projection of group 2 tells us the possible location of any obscured 
part of the convex section that produced group 2. So, if in the scene the two groups come 
from adjacent convex sections of the same object, then some obscured portions of group 
1 and group 2 must continue the groups at least until the point where their projections 
intersect. 

This gives us two reasons for expecting low values of inti and int 2 from groups that 
really come from adjacent sections of the same object. First of all, we expect the convex 
sections of the object to have shorter, rather than longer lengths of perimeter missing 
from the image, because we expect in general that shorter, rather than longer sections of 
perimeter will fail to appear in the image. Secondly, suppose inti has a value of 10 units. 
Then group 1 could come from any section of any object scaled so that it has a diameter 
greater than 10 units. But if inti has a value of 20 units, then only sections scaled to 
exceed 20 units in diameter could have produced group 1, if group 1 does indeed come from 
a convex section adjacent to the one that produced group 2. By limiting the number of 
possible scenes that might have produced an image with adjacent sections causing groups 
1 and 2 we reduce the chances that adjacent sections did produce groups 1 and 2. Both of 
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these reasons support the hypothesis that the chances of two groups coming from the same 
object decrease as inti and int 2 increase. 

On the other hand, a complete analysis would require us to examine the distributions 
of int\ and inti when the groups come from different objects, as well as more thoroughly 
analyzing these distributions when the groups come from the same object. We do not 
attempt such an analysis here. 

3.6 Future Work 

In paring the grouping problem down to just two factors, distance and orientation, we made 
many simplifications and threw away a lot of information present in images. The fact that 
we can use this theory of grouping to build a successful recognition system demonstrates the 
power of these two constraints. However, we should expect to perform better grouping by 
analyzing a more realistic model of the world, and by taking into account more information. 
These two kinds of additions work together, when we try to make use of more information, 
we need a richer model that tells us how to use that in formation. We can see this if we 
try to enhance grouping so that it can take advantage of three dimensional information, 
texture or color information, or shape information. 

We would like a grouping system that works well in three dimensional domains, not 
just on two di mensional scenes. Chapter 9 will show some examples that indicate that 
GROPER’s grouping system does well on images created by three-dimensional objects. But 
surely we can do better with a theory of grouping that takes into account the differences 
between two and three dimensional scenes, or uses depth information derived from images. 

Three dimensional scenes produce two dimensional images with different characteristics 
than the ones produced by two-dimensional scenes. For example, nearby things tend to 
appear in front of and larger than far away things. This might, for example, result in even 
shorter sections of occluded perimeter than with the situation analyzed in this chapter, 
because nothing occludes the nearest and largest objects. A more thorough analysis of the 
situation would tell us how it differed from the one discussed in this chapter. 

We also might want to make use of three dimensional information determined from the 
scene. For example, we might use stereo or motion clues to determine the distance from the 
viewer of different sections of the image. This could provide an important source of clues 
for grouping, but to use this information we would need to know something about how 
three dimensional scenes behave. For example, suppose we see a nearby patch on our left 
and a more distant patch on our right. We might expect, due to this distance, that it is less 
likely that these two patches come from the same object than if they were equally distant. 
But how much less likely? And what other factors influence this likelihood? Suppose the 
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patch on our left slopes away from us at a rate that lines it up exactly with the location 
of the patch on the right. A theory of three dimensional scenes would tell us how to make 
use of this type of depth information to perform grouping. 

We would also like to take advantage of color, shading and texture information when we 
perform grouping. This type of information appears to provide important clues in deciding 
whether sections of an image come from the same object. For example, we would expect 
two sections of an image with the same texture to have a greater likelihood of coming from 
a single object than if they had different textures. But to make use of this information, we 
need a much more elaborate model of objects and of the image formation process. 

We probably can draw the conclusion that similar textures more often come from the 
same object than do dissimilar ones, because we assume that if one section of an object 
has a certain texture, or color, then that texture or color is much more likely to show up in 
other parts of the object. But how much more likely? Does the distance between the parts 
of the object effect this likelihood, and if so, how? What about small changes in texture or 
color? Is a pink section of the image likely to come from the same object as a red section? 
and how does this likelihood differ when both sections are red? To answer these kinds of 
questions we need to have some kind of model of how objects are colored and textured, and 
how colors and texture change or repeat over the surface of objects. 

We also need to know how lighting effects can influence this process. For example, if two 
sections of an image appear to have different textures, we need to know if this means that 
the sections of object that produced them have different textures, or whether a difference 
in lighting might have produced this effect. Land [17] and others have proposed theories 
of computation to explain how we may answer this type of question for color and shading. 
We would like to approach the problem of texture in the same way, with a theory that tells 
us how to determine the chances that two image sections come from pieces of material that 
have similar colors and textures. 

Shape information can also provide important clues in determining the likelihood of 
groups coming from the same object. For example, if two groups have symmetric edges, 
this might incline us to think that they come from the same object. And, in the simplest 
case of symmetry, we would like to know how the presence of parallel edges will influence 
this likelihood. Lowe[l8] has analyzed this problem for a world model containing randomly 
oriented, three-dimensional objects. Also, perhaps asymmetric but still similar shapes will 
provide important clues. Judging the presence of such relationships does not cause much 
of a difficulty, but knowing how to make use of them does. To do so, we would like both a 
more realistic model of the shape of objects, and of their orientations. 

The model of objects used in this chapter assumes essentially random objects with 
random orientations. In such a world, symmetry and similarity of shape occur only through 
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Figure 3.13: The four straight edges group better together because the other edges account 
for the missing perimeter of an object that produced those edges. 

chance, and so occur only rarely both between groups that come from different objects and 
from the same objects. In the real world, however, symmetry and similarity of shape occur 
all the time. On the one hand, a more realistic model of objects will tell us that they often 
contain symmetric sections, having parallel edges especially often. Also, often the same 
shape occurs in different parts of an object. People, for example, usually have symmetric 
faces, and two similar hands that are not oriented to appear symmetric. On the other 
hand, the orientation of objects often conspires to produce symmetry and similarity between 
different objects. Consider parallel edges. My office now contains scores of horizontal and 
vertical edges. Some pairs of these come from a single object, but most come from a wide 
variety of different objects, because scenes constructed by humans tend to contain objects 
oriented to produce many horizontal and vertical edges. We might not want to conclude 
that the parallelism of two edges makes them likely to come from a single object. Also, a 
scene may contain many similar shapes that come from different objects. A forest contains 
many leaves for example. Many problems remain in constructing an accurate theory of how 
to make use of this kind of shape information, because to do so we must understand how 
to model the regularities in our world. 

Finally, this chapter only considered the relationship between two groups of edges in 
determining the probability that they go together. But other edges in the scene will also 
effect this probability. For example, if other edges surround the area in between the two 
groups, this might make it look more like the two groups come from the same object (see 
figure 3.13). This is because the other edges outline an object that would cover up the area 
between the two groups, providing an explanation for the distance that separates them. On 
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Figure 3.15: But when coosidexed as part of a seeae, the free#* from the previous ilhtstra- 
tion may not go as well together. 
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the other hand, if other edges in a scene go particularly well with one group of edges this 
might make it less likely that that group belongs with a different group of edges. We need 
to check not just whether a pair of groups go well together, but whether this pairing will 
lead to a good global interpretation that explains all the edges in the image. Figure 3.14, for 
example, contains two groups of edges that go well together when considered in isolation. 
But when considered as part of an image (figure 3.15), we see that combining these edges 
will not lead to a satisfying interpretation of all the edges in the image. Moving from a local 
consideration of pairs of groups to a global consideration of many groups creates challenging 
new problems. 

3.7 Conclusions 

This chapter has focused on determining how to use distance and orientation information 
when performing grouping. It has reached a fairly simple conclusion about the effect of 
distance: the greater the distance between two groups of edges, the less likely they are 
to come from a single object. Furthermore, it has provided one of the reasons this might 
be true, and made conjectures about bounds on how much this likelihood changes with 
distance. The chapter has also told us how to deal with short distances separating two 
groups of edges; this probability depends on the general likelihood of occlusions occurring 
in a scene. 

The analysis of orientations does not lead to quite as straightforward a solution. We 
have profitably divided orientations into three classes. Other factors being equal, we find 
that some types indicate that groups come from the same object more than do others. 
Furthermore, we see the general effect that changes in the angles and sizes of the groups 
have on the likelihood that the groups come from the same object. In addition, we have 
some bounds on parts of this probability. We have found that a full analysis of these 
probabilities would require us to make assumptions about the kind of objects contained 
in the library, as well as making assumptions about the kind of scenes we will encounter. 
However, we have a sound basis for heuristically choosing probability distributions that 
have the correct general shape. 

As the chapter on the grouping algorithm will make clear, we use the results of this 
chapter to make some informed guesses at the probabilities we want. Although not optimal, 
the heuristics chosen do result in a successful grouping system. Without this analysis, the 
choice of heuristics would not be obvious, at least it was not obvious to the author. So 
the accomplishment of this chapter is that it provides inspiration for a good choice of 
grouping heuristics, and a justification for arguing that these heuristics reflect the effect of 
real constraints of the world. 
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Chapter 4 


The Psychophysics of Grouping 


4.1 Introduction 

The theory of grouping presented in chapter 3 predicts that some combinations of edges 
have a greater likelihood of coming from the same object than do others. When people 
look at images containing edges, they see certain combinations of edges grouped together. 
This chapter points out that the theory of grouping derived in chapter 3 agrees with human 
grouping phenomena. This leads to the hypothesis that people perform grouping as a first 
step towards recognizing objects. According to this hypothesis, they group together the 
edges and other features in an image that have a particularly good chance of coming from 
a single object. Furthermore, we expect that evolution has equipped people to perform this 
task quite well. Following this hypothesis, we would expect any correct theory of grouping 
to accurately predict human behavior. Correct predictions based on the theory of chapter 
3 therefore lend support to that theory, telling us that even though the theory was derived 
from a simplified world, it may still accurately reflect the real world. Furthermore, the 
theory of grouping provides a deeper explanation of human grouping phenomena. We had 
hypothesized that people group together edges likely to come from the same object. But 
now, at least in some cases, we know why these edges are more likely to come from one 
object. 

This chapter makes use of only informal psychophysical evidence. It presents some 
images, and claims that “obviously” people will tend to group together one pair of groups 
of edges over a different pair. “Group together” means that the viewer will experience a 
sense that certain combinations of edges go together well, or that if asked, the viewer would 
affirm that those combinations of edges seem more likely to come from the same object 
than another combination of edges. The reader, of course, will judge for him- or herself 
whether these claims are obviously true. 


66 



Figure 4.1: People tend to form groups out of the nearby edges rather than the more distant 
ones. 

The examples given in this chapter try to isolate the effects of distance and relative 
orientation from other effects. For that reason, we try to avoid using images in which other 
factors might play an important part. For example, we know that people tend to group 
together parallel, symmetric, or co-linear edges. When comparing two different pairs of 
groups of edges, we make sure that either no pair contains such a quality, or that all pairs 
contain them equally. 

In the past, psychologists, and more recently, computer scientists, have investigated 
grouping phenomena. Lowe[18] offers an excellent summary of that research, which this 
paragraph draws upon. The Gestalt psychologists originally investigated grouping phe¬ 
nomena. Wertheimer[26] noticed that people will group together parts of an image, and 
suggested five different principles to explain the phenomena he observed. For example, he 
suggested that people group together nearby things, and similar things. The gestaltists 
also suggested that people group together those parts of an image likely to come from a 
single object. From a computational perspective, Witkin and Tenenbaum[27] argue that 
people’s perceptions make salient the aspects of an image least likely to occur accidentally. 
Parts of an image that have a relationship unlikely to happen by chance probably have 
a related underlying cause. Lowe[18] also follows this line of reasoning. He argues that 
nearby, parallel edges have a high probability of coming from the same object because such 
edges occur only rarely by chance when the edges come from different objects, but occur 
more often when the edges come from the same object. The type of reasoning utilized in 
chapter 3 obviously draws much inspiration from these approaches. This chapter will make 
use of the fact, noticed by Wertheimer, that people group together nearby things. It will 
also discuss the relationship between the orientation constraint developed in chapter 3 and 
other previously studied grouping phenomena. 


4.2 Proximity 

All other factors being equal, people tend to group together nearby edges. Figure 4.1 
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shows a series of parallel edges. People see the nearby edges forming pairs. One might 
instead see the adjacent, but more distant edges forming pairs. These alternate pairs would 
differ only from the pairs people do see in the distance separating the edges. That people 
do not see the image in that way shows that people group together the closest edges in the 
image. 

This agrees with the theory of grouping already presented. That theory suggested that 
nearby edges in an image have a greater likelihood of coming from the same object than 
do distant edges. Of the two rival ways of interpreting the image in figure 4.1, the way 
that people see the image requires an explanation for a series of short sections of perimeter 
that did not produce edges. The alternative grouping of edges requires an explanation for 
longer sections of missing perimeter. The theory of grouping explains why people should 
prefer the first possibility. 

As mentioned before, Wertheimer noticed this grouping phenomena. He did not argue 
explicitly that it arises because nearby edges often come from the same object, however. 
Lowe does present a rationale for the use of grouping based on the proximity of edges. 
He takes quite a different approach to the one presented in this thesis. He argues that by 
grouping together nearby edges that have other qualities, such as parallelism, we can reduce 
the complexity of recognition by limiting the number of groups we must consider. He does 
not argue that some edges axe more likely to come from a single object simply because they 
are nearby. 

4.3 Orientation 

The discussion of orientation in the theory of grouping allows us to make a number of 
predictions about which collections of edges will group best together. Although the theory 
does not tell us how to compute the exact probability that a pair of groups comes from the 
same object, it does offer us some ways to compare different pairs of groups. For pairs of 
groups with the same type of orientation, we know how altering certain variables will alter 
the probability that the pairs come from the same object. For example, we know that the 
larger the angle of projection of two groups, the less unlikely it is that a random orientation 
would produce a typei relationship. So, all other factors being equal, two groups with a 
larger angle of projection and a type i relationship will be less likely to come from the same 
object then two similar groups with a smaller angle of projection. We can also show that 
groups with a typei orientation have a greater chance of coming from the same object than 
do groups with a type 2 relationship, and that groups with a type 3 orientation have the least 
chance of all. Having made these predictions we can then see if they agree with the way 
people group edges. They do. 
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Figure 4.2: Groups A and B have a type i relationship, and seem to go better together than 
groups A and C, which are the same, but oriented to have a type 2 relationship. These in 
turn go better together than groups A and D, which have a type 3 orientation. 

The theory of grouping implies that, all other factors being equal, a pair of groups with 
a lower type number describing their orientation will have a greater likelihood of coming 
from the same object. This is because, for example, a type\ relationship always occurs 
between groups from the same convex section, and can also occur when the groups do not 
come from the same convex section of an object, while a type -2 relationship can only occur 
when the groups come from different convex sections of an object. In other words, lower 
types have more ways of occurring when groups come from the same object. Appendix C 
proves this. 

Figure 4.2 shows a group of edges, labeled A, paired with three other groups. The pairs 
have different types of orientations, but otherwise they are the same. As predicted, the 
groups in type 1 seem to go best together, the groups in type 2 go next best together, and 
the groups with a type 3 orientation seem least likely to come from a single object. 

The theory of grouping also makes some predictions that allow us to compare different 
pairs of groups with the same type of orientation. In particular, we will look at how changing 
the angle of the groups’ projections will effect the likelihood that they come from the same 
object. And when two groups have a type -2 relationship we can predict how a change in the 
distance to the area where their projections intersect will affect this likelihood. 

If two groups have a type\ orientation relationship, reducing their angle of projection 
should increase the probability that they come from the same object, if all other factors 
remain equal. If a group has a smaller angle of projection, then the other group has a 
smaller chance of falling inside that projection, if randomly oriented. So a smaller angle of 
projection decreases the probability of a type 1 orientation occurring if the groups come from 
different objects. It will also decrease the chances of this orientation occurring if the groups 
come from adjacent or non-adjacent sections of the same object. But, because it does not 
reduce the chances of this orientation occurring if the groups come from the same convex 
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Figure 4.3: Three pairs of groups with a type\ orientation. The pairs have the same 
relationship, but different angles of projection. 

section of an object, overall i ncreases - Figure 4.3 provides some examples 

to demonstrate that groups with type i relationships seem to go better together when they 
have smaller angles of projection. 

This analysis explains certain types of very strong grouping phenomena in humans. 
Suppose we have an image consisting of the two ends of a rectangle, separated by some 
distance, as in the left-most pair in figure 4.3. These two groups of edges seem to have 
an extremely strong bond. We can explain this bond as the limiting case of the situation 
considered above. Those two groups have a zero probability of having a type i orientation 
if randomly located, because this orientation requires their ends to line up perfectly. The 
only plausible explanation for this orientation, according to the theory of grouping, is that 
the two groups come from the same convex section of the same object. 

The theory of grouping has also predicted that when two groups have a type 2 rela¬ 
tionship, increasing the distance to the intersection of their projections will decrease the 
probability that they come from the same object. Figure 4.4 provides examples of pairs of 
groups that differ only in the distance to the intersection of their projections. These images 
appear to confirm the predictions of the theory of grouping. 

Past work has not paid much attention to grouping that results from the orientation of 
edges. This may be due to the fact that such grouping phenomena does not produce nearly 
as striking an effect as do other phenomena. Figure 4.5 contains examples of the grouping 
phenomena discussed by Wertheimer. These examples all cause strong grouping effects. In 
contrast, a pair of groups with a type 2 relationship and short distances to the intersection of 
their projections do not go together that strongly, just more strongly than they would with 
long distances to their intersection. For this reason, most work on grouping has focused on 
special relationships, such as symmetry, which cause particularly strong grouping. While 
these effects are important, they do not allow us to evaluate arbitrary collections of edges 
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Figure 4.4: Groups B and C seem to go together better than groups A and B. The same 
distance separates each pair, but the distance to the intersection of their projections differs 
greatly. 



Figure 4.5: Three of the striking grouping phenomena discussed by Wertheimer. At the 
top, we group together the nearby dots. In the middle, we consider the curved edge as a 
single contour. We do not see three separate closed areas with three separate curved ends. 
On the bottom, we group together the similar dots. 
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to see how well they go together. It is because this thesis aims to find large groups of edges 
in any kind of image that it has needed to examine less forceful factors in grouping. 


4.4 Future Work 

This chapter has presented some psychophysical predictions we can derive from the theory of 
grouping, and shown that they match experience. This attempts to show that an analytical 
examination of grouping can help to explain human grouping phenomena. But we can also 
take a different approach. If we accept the hypothesis that human grouping mainly results 
from an effort to find sections of an image that all come from the same object, then we can 
use human grouping as a basis for determining how to perform grouping with a computer. 
By carefully analyzing the way people group edges, we can resolve some issues left fuzzy 
by the theory of grouping. 

For example, experiments that simultaneously vary the distance between two groups 
and their relative orientation could tell us exactly how to combine these two factors to 
determine the probability that two groups come from the same object. In chapter 3 we 
only derive bounds on the way this probability changes with distance, or with the angle 
of projection of groups. Based on these bounds, we can not always accurately choose one 
pair of groups over another. If one pair is closer together, but has a type i relationship with 
larger angles of projection, experiments could tell us how people weight these two factors to 
choose one pair over another. We might also use psychophysical experiments to determining 
exactly how changes in the distance to the intersection of projections should change the 
probability distributions we want. In general the theory of grouping provides the general 
shapes of distributions, but experiments might provide more complete and accurate results. 

Psychophysical experiments might also test the fundamental hypothesis that people use 
grouping as a first step in object recognition. We might present people with images in 
which the edges they naturally group together do not really come from the same object, 
and see if this slows down object recognition. 
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Chapter 5 


A Grouping Algorithm 


5.1 Introduction 

Once we have a theory of grouping, we still need to determine how to apply that theory to 
create a grouping algorithm. The theory of grouping presented previously, while providing 
some clues about how to compare different pairs of groups, does not tell us everything we 
need to know. This chapter describes how GROPER makes such a comparison based on 
these clues. But such a test does not tell GROPER how to build up groups of edges large 
enough to help it recognize an object. So this chapter also describes how to use the theory 
of grouping to develop some short-cuts for creating helpful groups. 

GROPER performs grouping in three stages. First, it combines nearby edges into 
convex sections. These convex sections of edges then become the new primitives which 
GROPER uses for grouping; it never looks back at the initial edges. GROPER forms these 
convex sections because they fit the criteria for good groups which we can deduce from the 
theory of grouping. Secondly, GROPER considers every pair of these primitive groups to 
find the two groups which seems to go together best, and then combines them to form a 
new group. Performing this step involves deriving a comparison metric from the grouping 
theory. Thirdly, GROPER sometimes wants to extend one of these new groups, finding a 
third group which goes well with it. This may happen when indexing into the library of 
models reveals that a pair of groups match a number of different known objects. Together, 
these three stages feed the recognition section of GROPER, and respond to its results. 


5.2 Forming Convex Sections 

GROPER begins the grouping process by quickly forming connected, or nearly connected, 
convex sections of edges. This speeds things up a lot. It is a computationally inexpensive 
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Figure 5.1: On the left are five image edges. On the right, the two groups formed by 
GROPER out of these image edges, displaced slightly from their location on the left. The 
shaded area indicates the sides of the edges on which GROPER has decided the figure lies. 
Notice that an edge can appear in more than one group. 

process that allows GROPER to then consider combining groups of two, three or more 
edges, instead of dealing with only single edges. Combining a pair of single edges results in 
two edges that will usually match too many different objects to be helpful. But combining 
a pair of convex sections may result in something quite useful for recognition, because it 
contains more inf ormation. Also, this process creates groups of edges about which we can 
make figure/background judgments. With a single edge, we cannot determine this. But two 
connected edges appear convex only when we assume that the object lies on a particular 
side of the edges. However, beginning grouping this way also has a drawback. We must 
make sure that these convex sections actually contain edges likely to have come from a 
single object. GROPER uses these convex sections as its new primitives. Once it forms 
these sections, it never looks at the isolated edges again. So if GROPER incorrectly groups 
together edges that do not belong together, it may never be able to use those edges to help 
it find objects. 

As input, GROPER uses straight line segments which approximate the edges that an 
edge detector has found in the image. This thesis will not discuss either the edge detec¬ 
tion or the straight line approximation steps; many standard methods exist for solving 
both problems and GROPER makes no contribution in either area. For each line segment 
GROPER does not know on which side the object lies, and which side forms the back¬ 
ground of that object. As output, GROPER produces groups of line segments. Each group 
designates an inside and an outside for each edge in the group. Each group of edges forms 
a connected, or nearly connected, convex curve. One edge may appear in more than one 
group, having a different inside and outside in different groups. Figure 5.1 shows some line 
segments and the two convex groups formed from them. 

GROPER performs this step in two phases. First, it forms nearly connected “strings” 
of line segments. Then, it breaks up these strings into convex sections. GROPER forms 
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Figure 5.2: This shows the decomposition of a string of connected edges into two groups. 
Notice that in part five a new convex group is started using the same edge which ends the 
last convex group. 

strings of lin e segments by connecting two segments whenever they have end points closer 
to each other than to any other end points of line segments. By any other end points, 
we also include the other end points of the two fine segments considered for connection. 
So GROPER never connects two line segments if the distance between their end points 
exceeds the length of either line segment. GROPER “connects” two line segments by just 
putting them together in a list, ordered appropriately. Each segment appears in exactly 
one list (although GROPER may later break this list so that it appears in two groups), 
and a list may contain one line segment, or more than one. A line segment may connect 
to two others, one at each end. We do not allow an end point to connect to more than one 
other segment to avoid creating too many strings, but, of course, sometimes this leads to 
mistakes. 

GROPER then determines the figure and background sides of each edge by breaking the 
strings of line segments up into convex groups. GROPER simply follows the edges along a 
string. As long as the edges form a convex group, GROPER does not touch them. Adding 
an additional edge might make a convex interpretation impossible. For example, three 
edges might make a Z shape. In that case, GROPER takes the initial group of edges, and 
makes them a separate group. The first two edges of the Z would form a separate group, 
for example. GROPER then takes the last of these edges, adds the new edge on, and forms 
a new group containing two edges so far. The last two edges in a Z would also form a 
group. This results in one edge belonging to two groups, having a different figure/ground 
interpretation in each group. This second group of two edges may then have more edges 
added to it as GROPER continues to explore the rest of the edges in the string. Diagram 
5.2 gives an example of this process. 

This initial step produces groups of edges which have a particularly good chance of 
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Figure 5.3: On the left are the perimeters of two different objects, with a section missing 
where one lies on top of the other. On the right, the type of edges which GROPER’s 
pre-processing typically produces. The dotted lines indicate two convex, nearly connected 
groups of edges. 

coming from the same object. The previous chapter told us that the less the distance 
which separates two edges in an image, the greater the chances that they come from a 
single object. By connecting together the closest end points, we choose the most likely 
connections. Furthermore, we take into account the null hypothesis that a line segment 
represents the only section of an object’s perimeter which produced an edge in the image. 
In that case, the length of missing perimeter equals the length of the line segment. So we 
only connect line segments whose distance apart is less than that distance. 

Also, the previous chapter makes the convex interpretation of adjoining line segments 
seem the most likely. We assume that objects have relatively few points of concavity. This 
makes it more likely that two nearby edges come from the same convex section of an object 
than from adjacent convex sections. If we interpret the edges so that they form a concave 
shape, then they would have come from two adjacent convex sections of an object. 

We might put forward an additional argument in favor of this grouping step, as well. 
Some authors (see Schwartz and Sharir[22], for example) have used the fact that in some 
circumstances, convex sections of edges are extremely likely to come from a single object. 
Two objects together can produce a convex outline only when they fortuitously line up. Two 
squares, for example, form a convex contour together only when they each have a corner 
located at the same point. We do not make use of this argument, however, because it holds 
true only in the presence of conditions which this thesis does not assume. If we do not 
know figure from ground, or if the perimeters of objects do not produce continuous edges, 
then two different objects may often combine to produce connected, or nearly connected 
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convex contours. For example, two occluding objects always produce a pair of convex 
edges, if we mistake figure and background in the edges from both objects. If some of an 
object’s perimeter fails to produce edges, this can happen in other ways as well. Figure 5.3 
shows the complete perimeter of two occluded objects and the edge line segments which 
GROPER’s pre-processing steps typically produce from such a scene. The dotted lines 
show two collections of convex, nearly connected edges. Neither of them comes from a 
single object. 

Making these convex groups of edges greatly reduces the amount of work left to the 
grouping module. Without these groups we would have to look deeply into a search tree 
of possible combinations of edges. Furthermore, each edge would have two possible inter¬ 
pretations, since we would have no way to tell the figure side of a single edge from its 
background side. Instead we usually only need to find the best combination of two of these 
convex groups to find an object. Chapter 9 provides data which shows that groups of four 
or five edges usually match few objects in the libraries GROPER uses. And, even though a 
line segment can appear in two groups, GROPER usually produces fewer primitive groups 
of edges than line segments, since many groups contain three or even four edges. So this 
initial simple grouping step greatly reduces the amount of work needed to find objects by 
quickly combining the edges most likely to come from the same object. 

This section of GROPER puts together the best small groups it can. It does make 
mistakes. But this phase of GROPER works quickly. For an image with about one hundred 
edges, this phase of GROPER takes less than ten seconds to perform on a Symbolics 3600 
Lisp Machine. Furthermore GROPER performs this task with an extremely inefficient 
algorithm which carries out an O(N) process in 0(N 2 ) time. We have not bothered to 
make this phase of grouping faster, since it still represents only a small fraction of the total 
recognition time. 


5.3 Combining Convex Sections 

Once GROPER has formed these primitive convex groups, it then looks for larger groups 
which can help it to recognize an object. A single convex group may prove large enough 
to allow recognition of an object. But usually, GROPER must combine at least two of 
these groups. GROPER does this in a quite simple way. It considers every possible pair of 
groups of edges, to find the pair most likely to come from a single object. The difficulty of 
this task arises in determining how to evaluate these pairs of groups. To do this, GROPER 
calculates an evaluation metric based on the theory of grouping discussed in the previous 
chapter. 

First of all, however, GROPER examines the simple convex groups to see if any of 
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them might allow it to recognize an object. GROPER can use a simple group to index into 
its library of objects, and perform verification if the results look promising. To produce 
promising results, a group must match only a few different sets of edges among the known 
objects. If a group of edges matches many sets of object edges, GROPER goes back to its 
grouping module to try to extend the group. Since any collection of three or fewer image 
edges will usually match quite a few objects, GROPER does not even bother to perform 
indexing with such small groups. GROPER does index into its library of objects using any 
simple groups with four or more edges, and can recognize objects in this way. 

After this process, GROPER next looks for the most promising pair of simple groups. 
To find this pair, GROPER looks at the distance and relative orientation of each pair of sim¬ 
ple groups. For each pair GROPER calculates four probabilities: P(d\0\ = 02)iP(d\0\ ^ 
02),P{type, intl,int2\Oi = 0 2 > d, Z l5 / 2 , ai,a 2 ,a 3 ,a 4 ), and P(type, inti, int2 \0\ ^ 0 2 , 
d,l\, Z 2 , ax,a 2 , 03,04), and combines them to determine a metric which reflects the proba¬ 
bility that that pair of groups came from the same object. 


5.3.1 Distance, the Same Object 


First of all, GROPER estimates P(d\0\ = 0 2 ). It does this with the function: 

(MaxDiameter — d) 2 


P(d\0\ = O 2 ) 


M axDiameteP 


3 

where MaxDiameter is just the size of the image. This function reflects the occlusions which 
arise in a library of objects which each have a uniform distribution of distances between 
pairs of points, when we then take those objects at uniformly distributed scales. That is, we 
take a uniform distribution, average it over all scales, then square it. Figure 5.4 compares 
this distribution to the worst case distribution produced by a circle, and to the distributions 
created by a few randomly chosen polygons. 


5.3.2 Distance, Different Objects 

The theory of grouping has also told us that we can find the distribution of distances between 
groups from different objects by using a linear combination of the above distribution, and 
a distribution which reflects the distances between randomly oriented groups of edges. 
For small groups of randomly oriented edges, we can approximate the distribution of the 
distances between them with the function: 

P(d\ 2 wd _ 2d 

7r Max Diameter 2 MaxDiameter 2 

for d < MaxDiameter. And the experiments depicted in figure 3.12 show that this distri¬ 
bution does not differ much from the distribution when the groups are not small. So, if we 
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Figure 5.4: On the left, the probability distribution which GROPER uses for the expected 
distance between groups of edges from the same object. Then, the distribution we would 
expect from a library of just circles, or just one of the three random polygons shown in 
figure 3.7. 


linearly combine this distribution with the one given in the previous subsection, we find: 

(MaxDiameter - d ) 2 2d 

2 ) — * MaxDiameter 3 * MaxDiameter 2 

3 

That is, we approximate the distribution of distances between groups from different ob¬ 
jects by combining two distributions. The first reflects what happens when the groups were 
created by the intersection of the two objects. In this case, we have the same distribution 
as when the groups come from the same object. The second distribution increases linearly, 
and reflects what happens when the two groups have independent, random locations. The 
way in which we combine these two distributions should then reflect the likelihood of oc¬ 
clusions occurring in the scenes GROPER will examine. GROPER uses the arbitrarily 
selected value of .2 for k, which seems to work well. 


5.3.3 Orientation, the Same Object 

GROPER calculates the probabilities of different orientations occurring based on their For 
example, if two groups have a type one orientation, recall that: 

P{type\\0\ = C> 2 ,d, restofdata ) = 

P(typei\same, restofdata) * P(same) + P(type\\adj, restofdata ) * P(adj) 

+P(type\\notadj, restofdata) * P(notadj ) 

where “same” denotes that the two groups originally came from the same convex section 
of an object, and “adj” and “notadj” have comparable meanings. So to calculate all the 
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probabilities we need, we must know the chances that two groups of edges, randomly 
selected from one object, come from the same, adjacent, or non-adjacent groups of edges, 
and we must know the nine conditional probabilities which combine these three ways the 
groups can originate with the three orientation types which reflect the way they look. 

We arbitrarily choose the value | for P(same), P(adj) and P(notadj). Although this 
probably does not reflect reality very well, this does not matter too much. We want proba¬ 
bility distributions with the right general shape, but we do not care if some of their details 
are not optimal. A choice of | guarantees that GROPER takes each of the three possibilities 
into account in determining the distribution it wants. 

Of the remaining nine probabilities, four have obvious values. For example, P(type i| 
same, restofdata) = 1. That is, groups of edges which come from the same convex section 
cannot look as if they could not have come from the same convex section. Of course, 
in determining the type of two groups, we must take possible error into account. If two 
groups really do not fall inside each other’s projections, but allowable amounts of error 
would have them falling inside each other’s projections, GROPER considers them to have 
a typei relationship. So if they do have a typei relationship, the groups really could 
not have come from different convex sections of the same object. Similarly, we find that 
P{type 2 \same) = 0, P(type 3 \same) - 0, and P(type 3 \adj) = 0. We must approximate the 
remaining five probabilities. Chapter 3 concluded that: 


P{type 3 \adj,0\ = 0 2 , restofdata) = 


_ P{typei\Oi f 0 2 ) _ 

P(type 3 \Oi £ 0 2 ) + P{type 2 \Oi £ 0 2 ) 


P{type\\notadj,0\ = 0 2 , restofdata) = P(typei\Oi 0 2 ) 


and 


P(type 2 \notadj,Oi = 0 2 , restofdata) = P{type 2 \0\ 0 2 , restofdata) 


And, of course: 


P(type 2 \adj) = 1 - P{typei\adj) 


and 

P(type 3 \notadj) = 1 - P(type 2 \notadj) - P{type\\notadj) 

In addition to the type of orientation, GROPER also makes use of the distance to the 
intersection of projections when two groups have a type 2 relationship. Recall that we call 
these distances inti and int 2 . This might seem like an esoteric piece of information of which 
to make use, particularly since we do not have a well developed theory of inti and int 2 . 
However, the informal psychophysics discussed in chapter 4 should convince the reader 
of the importance of these variables. And using this information significantly improved 
GROPER’s grouping. 
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When using intersection information, GROPER first looks for some special cases which 
it treats separately. First of all, GROPER sets a maximum value on the size that an object 
part can have. It sets this value at about half of the size of the image. If either inti or 
int 2 exceeds this size, then the two groups can not come from adjacent sections of the same 
object after all. Secondly, if only a short distance separates the two groups, GROPER 
ignores the values of inti and int 2 as unenlightening. For example, if two groups touch 
where they end, their intersection distances will always have the value zero, which tells us 
nothing about the likelihood that they came from the same object. Even when the groups 
just end near each other, almost any orientation of the groups will produce inti and int 2 
with values close to zero. So the random orientations which the groups presumably have 
when they come from different objects will usually produce low values for the intersection 
distances. Since we expect groups from the same object to usually have low intersection 
distances, situations which also produce low intersection distances when the groups come 
from different objects will not provide any useful information. In such cases, GROPER 
does not use the values of inti and int 2 . 

When GROPER does use inti and int 2 , it does so such that lower values for these 
variables indicate an increased likelihood that the groups come from the same object. Fur¬ 
thermore, the discussion in chapter 3 suggests that this likelihood decreases linearly with 
the distance. For that reason, GROPER uses the following distributions: 

„ , 2( Max Part Diameter - x) 

Piinti = x\Oi = 0 2 ) = -i——— -—-- 

Max Part Diameter 

When determining P(inti = x\Oi ^ O 2 ), GROPER uses a uniform distribution. Together, 
these distributions have the effect that as the distance increases to the place where the 
projection of groups intersect, the judged likelihood that the two groups come from the 
same object will decrease linearly. 

5.3.4 Orientation, Different Objects 

When considering the possibility that two groups come from different objects, GROPER 
decides on the chances that, when randomly oriented, those two groups would produce a 
typei orientation, a type 2 orientation or a type 3 orientation. In doing so, GROPER follows 
the distributions suggested in chapter 3. 

To determine the probability that two groups would have a type 1 relationship, GROPER 
compares the angle of each group’s projection to the angle which the other group presents 
to it, multiplying together the probability that group 1 falls into group 2 ’s projection and 
the probability that group 2 falls into group l’s projection, as if they were independent. 

GROPER determines the probability that group 2 falls in group l’s projection in two 
different ways, depending on whether group 1 has a finite or infinite projection. 


81 




Figure 5.5: Two groups with a type i relationship, “t” represents the aspect the second 
group presents to the first group, “ai” is the angle of projection of the first group. The 
probability that the second group falls in the first’s projection is approximated by 

If it has an infinite projection, GROPER subtracts the angle which group 2 presents 
to group 1 from the angle of group l’s projection. It calculates the angle group 2 presents 
by taking the angle formed by a line from the first point in group 2 to the first point in 
group 1, and a line from the last point in group 2 to the first point in group 1. Figure 5.5 
illustrates this calculation. GROPER then divides the difference of these two angles by 2 x. 
This roughly approximates the probability that group 2 would continue to be in group l’s 
projection if we rotated group 1 randomly, without altering its distance from group 2 . 

If group 1 has a finite projection GROPER does something similar. It first finds the 
point where group l’s first and last edges intersect. Then, it rotates group 2 around group 
1 so that it falls inside groups l’s projection. It then measures the size of the face that 
group 2 presents to group 1 by taking the angle between a line from the start of the rotated 
group 2 and the intersection point, and a line from the end of the rotated group to the 
intersection point (see figure 5.6). Finally, it subtracts this angle from the angle of the 
projection, and divides by 2w. This approximates the probability that group 2 would fall 
in group l’s projection if we randomly rotated group 2 about group 1 . 

GROPER estimates the probability that two groups will have a type 3 relationship using 
the formula presented in chapter 3. That chapter presented a solution to the problem when 
the groups are small relative to the distance between them. When a\ and 02 represent the 
angles of the projection of the two groups, then: 

— — (1 — (l -I- aia2 

P{typez\0\ ^ O 2 ,restofdata ) = —-----— 

27T 

For this problem, if a group has a finite projection, GROPER just uses 0 as that group’s 
angle of projection. 

Finally, to determine the probability of type 2 occurring, GROPER just subtracts from 
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Figure 5.6: The group on the left has a finite projection. After rotating the other group to 
fall in this projection, “t” represents the aspect the second group presents, and “ai” is the 
angle of projection of the first group. The probability that the second group falls in the 
first’s projection is again approximated by 

1 the probability that type x or type 3 will occur. These probabilities, like most of the ones 
GROPER uses contain a certain amount of guesswork. We justify their use on two grounds. 
First of all, some real insight into the physical processes that produce images lies beneath 
this guesswork. And secondly, the grouping system that results from these guesses performs 
well. Chapter 9 will present examples of this performance. 


5.4 Creating Larger Groups 

Sometimes, after combining two different groups of edges, indexing will show that a large 
number of sets of edges from different objects might have produced all the edges belonging 
to the two groups. In such a case, we do not want GROPER to go back and find a different 
promising pair of groups. Instead, GROPER considers proposing for indexing all triples 
of groups which contain that pair of groups. It may propose one of these triples in two 
ways. First of all, it estimates the probability that all the edges in these three groups come 
from the same object. This allows it to compare these triples against the unexplored pairs 
of groups. But additionally, GROPER’s grouping system tells its recognition system to 
explore five promising triples of groups before it considers any more pairs of groups. We 
have chosen the number five arbitrarily, any other small number might do as well. This 
prevents GROPER from nearly finding an object, and then wandering off to explore other 
interesting parts of the image. 

GROPER estimates the probability that three or more groups of edges come from the 
same object by combining the probabilities that pairs of groups could have come from the 
same object. This allows it to compare triples of groups to each other and to pairs of 
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Figure 5.7: Six ways to combine three groups of edges. The thick lines represent groups, 
the thin dashed li nes show how they connect. 


groups. The probability that groups one, two, and three come from the same object is just 
the probability that groups one and two come from the same object, times the probability 
that groups two and three come from the same object, given that one and two do. We know 
how to estimate the probability that groups one and two come from the same object, we 
must consider how knowing whether these two groups come from the same object effects 
the likelihood that a third group comes from this object as well. 

Recall from chapter 3 that our estimation of the likelihood that groups one and two 
come from the same object was really based on the likelihood that a path exists from one 
group to the other along the perimeter of an object, and that no intermediate section of 
perimeter has produced an edge in the image. We have six ways we can string these three 
groups together. Figure 5.7 illustrates these six possibilities. GROPER considers all six of 
these possibilities for each triple of groups it wants to consider, and picks the most likely 
ones to present to the indexing module. 

GROPER breaks the problem up into these six parts, because it then assumes that to 
determine the likelihood that group one connects to group two, and group two connects 
to group three, it only needs to compute the probabilities of these two events separately, 
and then multiply them together. GROPER assumes that whether or not group one is 
closest to group two, and from the same object, does not effect the likelihood that group 
three is closest to the other end of group two. This assumption follows the whole spirit of 
this thesis which assumes that objects are in a sense random, and that what happens in 
one distant part of an object will not effect the likely appearance of another part of the 
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object. It would also be hard to see how we might use the information that group one and 
two come from the same object to influence our assessment of group three’s likelihood of 
coming from that object. 

So to determine the likelihood that group one connects to group two, which connects 
to group three, GROPER just multiplies together two probabilities. Calculating these two 
probabilities differs only slightly from the methods previously described to evaluate a pair 
of groups. In evaluating a pair of groups, GROPER used the minimum distance between 
the end of either group and the beginning of the other. In this case, GROPER knows that 
the end of group one is supposed to connect to the beginning of group two, and the end of 
group two is supposed to connect to the beginning of group three. So GROPER uses these 
distances. Other than that, GROPER calculates the two probabilities just as the previous 
section has described. 


5.5 Summary 

This chapter has explained how to build a grouping module based on the theory of grouping 
previously presented. In it we have had to make a few additional arbitrary decisions, for 
example, choosing the value of | for the probability that two groups from the same object 
come from adjacent convex sections of that object, or choosing \ for the value used in 
arriving at the distribution of distances between groups from different objects. We did not 
use these values to fine-tune GROPER, however. In most cases they are the first values 
picked, based on guesswork and a rough idea of what was needed. Probably any similar 
values would result in a grouping system that performs as well. In spite of this guesswork, 
the grouping system described in this chapter relies mainly on the insights derived from the 
theory of grouping. The geometry of image formation has told us how to perform grouping. 


5.6 Future Work 

Two types of future work could improve the grouping algorithm presented in this chapter. 
First of all, at least two ways exist for improving the accuracy of this grouping system. 
Secondly, many ways exist for speeding up this algorithm. The existing implementation 
aimed mainly at testing the effectiveness of the evaluation metric at producing correct 
groups of edges, not in providing an efficient implementation. Efficiency was considered 
only when a slow system hindered development efforts. 

GROPER’s grouping system tends to make at least two kinds of errors which humans 
probably would not make when looking at images. First of all, GROPER will occasionally 
incorrectly combine some edges into the same convex section, hindering it from finding an 
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Figure 5.8: Dotted lines show straight line approximations to the edges found in an image. 
GROPER incorrectly combined the two solid edges into a single, convex section. 

object which contains only some of those edges. Figure 5.8 illustrates an example of this. 
Secondly, GROPER’s grouping system will often select a pair of groups which, in isolation 
seem to form a good group, but which do not seem to work well together when taken in 
the context of the entire image. Figure 5.9 illustrates this problem. Modifications to the 
grouping system aimed at solving these problems could improve its accuracy. 

The first type of problem comes about because, although GROPER forms primitives out 
of the edges most likely to come from a single object, sometimes less likely interpretations 
are correct. GROPER tends to make two kinds of mistakes. First of all, sometimes the 
nearest edges did not actually come from the same object. Secondly, sometimes an object 
produces extended concave sections of edges. These problems occur rarely enough so that 
GROPER still works effectively in spite of them. But we do have a trade-off available. We 
could produce a more accurate system by producing more primitive groups or smaller ones, 
requiring more computation to find correct larger groups of edges. 

We could handle nearby edges which do not come from the same object in two different 
ways. First of all, we could place stricter limits on the distance that we allow to separate 
two edges which GROPER will combine. Instead of always picking the most likely pairing 
of edges, we might only combine edges if they seem considerably more likely to go together 
than with any other edges. Often two edges have end points quite close to each other in 
an image, and a third edge has an end point also near these two. We might argue that in 
such cases, we have more than one quite likely way to combine edges, so we can not choose 
any with certainty. This would discourage us from making any questionable combinations 
of edges, producing more, smaller groups of edges about which we felt more certain. On 
the other hand, when confronted with several different options, all almost equally likely, we 
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Figure 5.9: GROPER decided that the two groups shown go well together. While they 
might appear to go well together when looked at in isolation, people would not consider 
them a promising group when looking at the entire image. 

might produce several different primitive groups of edges. This would increase the number 
of primitive groups, but still give us large groups with which to work. Either approach 
should produce a more accurate grouping system. When GROPER currently incorrectly 
combines two edges into the same group, it may lose the ability to use those edges to 
recognize an object. By creating smaller groups, or more alternate groups, this would 
happen less often. 

A series of concave edges can also produce problems for GROPER. As figure 5.10 shows, 
a single concavity in an object’s perimeter will not cause problems. Although GROPER 
will form an incorrect group out of the two line segments before and after this point of 
concavity, it will also form two correct groups which, when combined, account for the 
object’s perimeter. However, a series of concavities, such as those produced by the gripping 
part of a wrench, will cause GROPER to form a single, large group, with figure and ground 
reversed. GROPER can then not use that group to correctly recognize an object, since the 
recognition module uses an incorrect surface normal for these edges. We might overcome 
this problem by creating a second, concave group out of every convex group. This would 
increase the amount of work needed to find good pairs of groups, but would avoid errors 
which occasionally crop up. 

Reversing figure and ground can lead GROPER’s grouping module to suggest groups 
that did not really all come from a single object. Such mistakes do not usually lead to the 
incorrect identification of objects, but they do waste a certain amount of GROPER’s time. 
If an object has a point of concavity, it will consider the possibility that the two edges 
before and after this point form a convex pair pointing out from the object. When two 
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Figure 5.10: Two strings of image edges are on the top. The primitive convex groups they 
produce axe on the bottom. On the left, even though one group has figure and ground 
reversed, all edges appear in a group which has the correct interpretation. On the right, 
three edges are lost to incorrect figure/ground judgments not made correctly in any group. 


objects with concavities are nearby, only a short distance may separate two such groups. If 
they point at each other as well, GROPER will decide that they probably come from the 
same object. And, in fact, taken in isolation the edges do look promising. In the context of 
the entire image, these edges do not seem quite as good for two reasons. First of all, edges 
from each of the two groups will also usually group well with other edges from the object 
they really come from. So we might want to make the metric assigned to a pair of groups 
depend also on how well the edges in those groups go with other groups. Secondly, a pair 
of groups from nearby objects, with figure and ground reversed, will usually not go well 
with other groups of edges from either object because the other groups will usually have 
figure and ground assigned correctly. So, we might also want GROPER to judge a pair of 
groups partly on how well we could extend that pair with an additional group. These two 
approaches should improve GROPER’s performance at telling figure from ground, without 
requiring a pre-processing step to determine this. 

Two straightforward methods could also increase GROPER’s speed. The theory of 
computation of grouping provides the location of groups likely to go well with a particular 
group. We know that groups near a particular group, or in its projection will tend to have 
the highest evaluation metric. Groups in typei can have high metrics even when far apart, 
because they point at each other in an unlikely way. Consider, for example, the two ends 
of a long rectangle. But distant groups with a type-i or type 3 orientation cannot have high 
metrics. Yet GROPER spends most of its time evaluating the metric for every pair of 
groups. Instead, GROPER could start by looking at nearby groups, and groups which fall 
inside each others projections. Only if no such pairs of groups receive high metrics would 
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GROPER need to continue looking at the less likely candidates. By ordering the search for 
good pairs of groups, GROPER could avoid ever needing to evaluate the metric for most 
pairs of groups. 

The grouping problem also lends itself naturally to a parallel solution. Most obviously, 
GROPER could find the metrics that describe each pair of groups in parallel, since this 
operation contains no dependencies. Obviously, implementing good parallel algorithms 
for recognition requires much more work than this. Parallel vision algorithms have had 
their best success on problems which can be solved by analyzing only local sections of an 
image, separately, although Mahoney[l9] describes some work on parallel implementations 
of higher level processes. While not as naturally parallelizable as edge detection, we just 
wish to suggest that grouping, too, has an inherently local character which makes it an 
attractive candidate for parallel implementations. 
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Chapter 6 


Indexing 


Once GROPER has collected a likely group of image edges, it needs to determine which, 
if any, object’s edges might have produced those image edges. GROPER does this by 
determining some properties of that group of image edges, and then looking up in a table 
to find modeled object edges that might produce image edges having those properties. 
This approach relieves GROPER of the necessity of explicitly considering every possible 
combination of object edges. Instead, with indexing, GROPER only needs to consider 
combinations of edges that have the right properties. 

GROPER’s indexing module uses as input, in a pre-processing step, modeled object 
edges, and in a run-time step, a group of image edges. GROPER obtains the object edges 
from a human operator, who provides lists of straight line segments that approximate the 
perimeters of the objects for which GROPER will look. Along with these line segments 
comes information about their outward-pointing normals, that is, on which side of the line 
segment the object lies. Furthermore, a human operator provides GROPER with estimates 
of the maximum likely error that will later occur in sensing these edges. Error can occur 
in the location or the orientation of an edge. The run-time input also consists of a list of 
l in e segments and their normals, this time provided by GROPER’s grouping module, as 
described in the previous chapter. GROPER uses the pre-processing input to build up a 
table describing the relationships between pairs of edges, it uses the run-time input to look 
up suitable combinations of edges in this table. 

When given a group of image edges, GROPER’s indexing module responds with lists of 
line segments from object models that might have produced these edges, allowing for some 
error, or partial occlusion. 

We have priorities on the way GROPER performs this task. We want to minimize 
the mistakes it makes, of course, but if indexing turns up some groups of object edges 
that could not have produced the group of image edges in question, this will not hurt 
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GROPER’s performance as much as missing some correct matches. If indexing failed to 
produce a correct match, GROPER would have no way to recover from such an error, and 
would fail to recognize an object that produced that group of image edges, unless that object 
produced another set of edges that GROPER’s grouping system could turn up later. It does 
not hurt GROPER’s performance nearly as much if its indexing component turns up some 
groups of object edges that could not really have produced the image edges in question, 
because the subsequent verification step will weed out these mistakes. Such mistakes will 
waste some time by causing GROPER to attempt to verify them, but these attempts should 
fail. GROPER’s accuracy may also suffer a little from such mistakes, because its decision 
as to whether to perform verification on the results of indexing will depend on how many 
matches indexing turns up. If indexing turns up extra, incorrect matches, GROPER will 
make a less well-informed decision about whether to try to verify the results of indexing. 
But producing incorrect matches with indexing will less often cause GROPER to waste 
time or miss an object than will failing to produce correct matches. 

Also, GROPER will require pre-processing space and time to build the tables used in 
indexing. But the time required to build up a table for indexing will not concern us much 
compared to the time required to perform indexing steps at run time. We must require that 
GROPER use only a reasonable amount of space for this table, however. If the entire table 
can not fit in primary memory, this will slow down run-time performance. So, GROPER’s 
indexing focuses on minimizing the amount of processing required at rim-time, and avoiding 
missing possibly correct matches. 


6.1 GROPER’s General Approach to Indexing 

Because GROPER makes use of straight line approximations to curved surfaces, it can 
use a method of indexing that takes advantage of simple descriptions of groups of edges. 
We can precisely describe the relationship between a group of straight lines with a limited 
number of parameters. For example, we can fully describe two lines with five parameters. 
We can then build a five dimensional table using the object models, with an entry for each 
set of five parameters that two edges of the object might produce. When given a pair of 
image edges, we would then just need to compute the parameters that describe them, and 
look up in the table to find all pairs of edges that could produce those five parameters. This 
brief description avoids some complications we must deal with, but GROPER’s indexing 
does not differ too much from this simple approach. 

Some past work has also been done on this subject. Schwartz and Sharir[22] have 
dealt with the complexities that ensue when we wish to index using continuous curved 
edges, instead of straight fines. They do not, however, apply this approach to unconnected 
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Figure 6.1: On the left, three edges from a model. On the right three image edges. Each 
pair of image edges is compatible with the corresponding model edges, but as a whole, the 
image edges do not match the model. 

curved edges. Wallace[25] develops an indexing system for pairs of unconnected edges, 
but does not deal with error or occlusion. Thompson and Mundy[23] use indexing to find 
the pairs of vertices in a three-dimensional object that correspond to a pair of vertices 
in its two-dimensional image. This type of approach cannot handle arbitrary groups of 
image edges, but it does address the problem of recognizing three-dimensional objects. 
Another important approach to indexing is Ettinger’s parts-based approach (Ettinger[7]). 
His system looks for common sub-parts of objects, and uses them to index into a library of 
objects. Chapter 8 discusses these approaches in more detail. 

The main piece missing from the simple description above is that we wish to index 
using groups with more than two edges in them. Although we could simply compute more 
parameters describing groups containing more edges, it would not be practical to build up 
a table that contains entries for the parameters that describe every possible combination 
of edges in an object; if an object has n edges it can produce 2 n different combinations of 
edges. 

So instead, GROPER breaks the indexing problem into smaller problems and combines 
the results. It builds a table that contains entries for every pair of edges in each object 
model. Then, to identify the model edges that might have produced a group of image edges, 
GROPER looks in this table to find all the pairs of edges that might have produced each 
pair of edges in the group and combines the results. This approach leaves two separate 
problems: how to perform indexing with a pair of edges, and how to combine the results. 

In determining how to perform indexing, we must decide on a particular parameteriza¬ 
tion of the relationship between two edges. Furthermore, we must keep in mind that, when 
building the lookup table, we need to make entries, not just for the particular relationship 
between two model edges. We must consider all possible relationships between the image 
edges these model edges can produce, considering error and the fact that only a portion of 
a model edge may show up in the image. 
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In combining the results of these lookups, we must consider the particular problem that 
a set of pairwise consistent matches may not be globally consistent. For example, figure 
6.1 shows three image edges, A, B, and C, and three modeled object edges, A’, B’, and C\ 
which form a triangle. Clearly A, B, and C could not have come from model edges A’, B’, 
and C’ respectively. And yet, upon examination we see that edges A’ and B’, considered 
alone, could have produced edges A and B. Similarly, B’ and C\ considered alone, might 
have produced B and C, and A’ and C’ might have produced A and C. GROPER’s indexing 
system therefore takes special care in order to come up with matches between image edges 
and object edges that have global consistency. Grimson and Lozano-Perez[lO] discuss this 
issue in more detail, and provide some information about how frequently it occurs. 


6.2 Choosing a Parameterization 

Many different ways exist of choosing parameters that characterize the relationship between 
two edges. A good choice will have a number of desirable traits. First of all, we would 
like a parameterization that we can compute quickly, at run time. Secondly, we would like 
a parameterization that will help us to ensure the global consistency of matches when we 
combine the matches found for pairs of edges. As this section will relate, GROPER has 
a parameterization particularly suited to this task. Thirdly, a good parameterization will 
facilitate the construction of an object library by making it easy to find the range of values 
a pair of edges may create, when subject to error and occlusion. 

This problem of local consistency and global inconsistency emerges because the fact 
that edges A, and B could come from some fragment of edges A’ and B’ does not tell us 
which fragments of A’ and B’ could produce A and B. Does A come from the beginning of 
edge A’, its end, or somewhere in the middle? The match of A and B to A’ and B’ may 
work only if A comes from the beginning of A’, while the match of A and C to A’ and C’ 
works only when A comes from the end part of A’. We can check these two local matches 
for consistency by seeing what they imply about the location of the fragment of A’ that 
produced A, relative to the start of A’, and making sure these locations are compatible. 
So, GROPER uses a parameterization of edge relations that makes it easy to tell, when a 
pair of edges match modeled edges, what this match implies about the location of one of 
the image edges in the model edge that produced it. 

GROPER makes use of two different parameterizations, one for nearly parallel edges, 
and the other for non-parallel edges. 

6.2.1 Non-parallel Edges 
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Figure 6.2: Two edges axe in bold. Five parameters describe their relationship. 

GROPER uses the angle between two edges as one of the parameters that describes 
them. GROPER then calculates the point in the plane where the two lines defined by the 
edges intersect. GROPER then measures the minimum and maximum distances from each 
edge to this point. Figure 6.2 depicts these five parameters. 

These parameters will make it easy to discover at what part of an edge an edge fragment 
originates. For example, suppose we have two perpendicular model edges located at ((0 
10) (0 20)) and ((15 0) (23 0)). We would describe these edges with the angle and the 
distance ranges (10,20) and (15,23) respectively, since the two lines the edges define would 
intersect at the origin. If we then detected two image edges with parameters -j> (15,20) and 
(15,23), we would know that the first image edge must come from the second half of the 
first model edge. 

With this parameterization it also becomes easy to determine the range of parameter 
values that fragments of two edges might produce. Two fragments of edges always appear 
at the same angle as the original edges. And since a fragment of an edge defines the same 
line in the plane as the full edge, two fragments of edges have the same intersection point as 
the two original edges. So the maximum and minimum distance from the edge fragments 
to the intersection point will fall within the range of the maximum and minimum distances 
for the whole edge. 

Error can also affect the range of possible values for these parameters. If error occurs in 
detecting the angle between two edges, this will obviously affect the angle between the edges, 
but it will also affect the distance from the edges to their intersection point. Suppose an 
error of ^ may occur in determining the angle of an edge. Then, for the two edges described 
above, instead of always expecting an angle of ^ between the image edges they produce, 
we must expect an angle somewhere in the range (fo'i 20 )• Furthermore, error in angle can 
also effect the location of the intersection point for two edges. For example, if the first 
edge produces an image edge still sensed to begin at (0 10), but with a ^ difference in 
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its angle, the intersection point between this edge and the edge from (15 0) to (23 0) may 
fall anywhere between ( — 3^,0) and (3|,0). This error alone will affect the minimum and 
maximum distance from each edge to the intersection point, and of course error may alter 
the sensed angle of the other edge as well. The extent to which error in angle affects the 
maximum and minimum distances to the intersection point will depend on the initial angle 
between the edges, as well as on their location. For example, error in the angle between 
two almost parallel edges will affect the location of their intersection point far more than 
it will affect two perpendicular edges. When GROPER makes entries for two edges in its 
indexing lookup table, it must make sure that these entries cover all possible parameters 
these edges can produce; and these parameters can vary quite a bit, in related ways, due 
to error in the sensed angle of an edge. 

In order to build the indexing table, GROPER calculates the ranges that the four 
distance parameters can assume for a particular range of angle values. GROPER finds 
the extreme possible distance values for this range of angle values simply by calculating 
the distance values for the extreme possible orientations. First of all, GROPER considers 
that each edge might produce a tiny fragment, either where the edge starts or where the 
edge ends. The two edges could produce any pair of such fragments, one from each edge. 
The extreme distance values come from these pairs of fragments. We consider only tiny 
fragments, because, at this stage we calculate only the effect of errors in angle. Taking a 
full edge and rotating it by ^ would produce not only changes in the angle of the edges, 
but also large changes in the location of the edge. Upcoming paragraphs will discuss how 
GROPER separately accounts for the effects of errors in the location of the edges. 

So GROPER considers all four possible pairs of these extreme edge fragments. Then for 
each pair, we consider four different possible orientations of that pair. The range of allowed 
angles provides a minimum and maximum possible angle. And for each of these extreme 
angles we consider tilting the edges clockwise as much as possible, while still achieving that 
angle and staying within the maximum allowed error in orientation of an edge. And we 
consider tilting the edges as much as possible counter-clockwise. So we have four possible 
extreme pairs of edges, for each pair we can tilt the edges to produce the minimum or 
maximum allowed angle, and then for each of these eight combinations we can tilt the 
edges as much as possible clockwise, or as much as possible counterclockwise, producing a 
total of sixteen possible combinations. GROPER computes the maximum and minimum 
distance to the intersection point produced by each combination, and then takes the total 
maximum and minimum values over all these combinations as the values of the needed 
parameters, accounting for angle error. 

So we find the extreme values by only considering the extreme angles, and the edge 
fragments at the ends of the edges. To see that extreme values do not occur in other places, 
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notice that as we vary from one extreme to another, the parameters produced also vary 
from one extreme to another. For example, if we varied the angle between the extreme 
minimum angle and the extreme maximum angle, we would be able to rotate the edges 
more and more, producing values that change continually in one direction only. 

Sensing edges can also produce error in determining the location of the edge. We can 
divide this error into two parts. Error may displace the location of an edge in a direction 
tangent to the edge. This will result in error in determining the maximum and minimum 
distance from that edge to the intersection point. For example, if we sense the above 
mentioned edge at ((0 12) (0 22)) instead of at ((0 10) (0 20)) then we will find it has 
minimum and maximum distances to the intersection of (12, 22) instead of (10, 20). But this 
will not effect the distance between the other edge and the intersection point. Alternately, 
error in the direction normal to an edge will produce a change in the distance from that 
edge to the intersection point. So the effect on these parameters of error in the location of 
an edge depends on the angle between the two edges. Potentially, a small error in location 
can cause a big change in the value of these parameters, if the two edges form a small 
angle. In such a case, a small error in determining the angle of an edge will also produce 
a big change in the location of the edges’ intersection point, and hence in their maximum 
and minim um distances to that point. So we can approximate the effect of error in the 
location of an edge by adding a small amount of padding to its maximum and minimum 
parameter values. In cases where a small error in the location of an edge results in a big 
change in these values, we will have already put a lot of padding into the allowed range of 
these values, by accounting for the large possible changes caused by possible errors in the 
angles of the edges. In this way, determining the possible values of these parameters in the 
face of error becomes a problem of analyzing the effect of error in the angle of the edges. As 
the confusing nature of this discussion indicates, this particular choice of parameterizations 
does not lend itself especially well to determining the ranges of parameter values in the 
presence of error. 

Once we know how to compute the range of possible parameters for a pair of edges, we 
can build a table of all possible parameters for all pairs of edges. Two problems emerge 
when we do this. First of all, we must quantize these values. And secondly, every pair of 
edges has quite a large number of possible parameter values. 

In order to put values into a table we need to round them off. This quantization will 
introduce some extra padding in the range of allowed parameter values. To determine the 
table entries we need to make for a specific pair of edges, GROPER considers the range 
of angles these edges might produce, given the tolerated amount of error in sensing the 
angles between them. Then GROPER divides this range up into subranges. GROPER 
uses subranges of size A pair of edges might produce angles in a range from to 
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GROPER divides these into the subranges (-^f )> and )• Then 

for each subrange, GROPER determines the minimum and maximum possible distance from 
each edge to the intersection of the edges, using the algorithm described above. GROPER 
adds some extra padding for possible errors in the locations of the edges, and then uses 
these values to make entries in the table. 

We could view our table as a five dimensional array, containing an entry for every pos¬ 
sible quintuple of parameters an edge might produce. This approach has the disadvantage 
that a single pair of edges would produce a large number of table entries. We must consider 
every combination of distance ranges, one from each edge. And each edge will have many 
valid distance ranges, one for each subsection of the edge. So each edge has a number of 
distance ranges proportional to the square of its length. And the number of different pairs 
of distance ranges for a given angle will be proportional to the product of the squares of 
the edge lengths. 

Instead, a trick reduces the amount of space needed, at the cost of some run time 
processing when GROPER performs lookups. Instead of a five dimensional array, GROPER 
uses a three dimensional array, indexed with the allowed angle between the edges, the 
allowed distance from edge one to the intersection point, and the allowed distance from 
edge two. So GROPER makes an entry in the table for every allowable distance from a 
point in edge one to the intersection, and from a point in edge two to the intersection. 
Later, when performing a lookup based on two image edges, GROPER actually takes the 
intersection of two lookups. First it looks in the table under the angle between the edges, 
and their minimum distances to the intersection point. Any pair of model edges that 
might have produced these two image edges will have an entry there. Then GROPER 
does a second lookup, using the angle between the edges and their maximum distances. 
The intersection of these two lookups provides all pairs of model edges that could match 
these image edges. This requires extra time, both because of the two lookups required, and 
because GROPER must intersect two sets, potentially a good deal larger than the result. 
But the table entries for a pair of edges now only require space proportional to the product 
of the lengths of the edges. 

This explains how GROPER finds the pairs of object edges that match two image edges. 
It remains to discuss how GROPER combines these pairs to find the consistent matches 
for a group of edges. It does this by keeping track, for each match, of the possible amount 
of the front of one of the model edges that might be missing from the corresponding image 
edge. Suppose, for example, GROPER finds a match between image edges A and B and 
model edges A’ and B\ And suppose GROPER knows that, even with error, model edge 
A’ can not have a point less than 20 units away from A’s intersection point with B\ Then, 
if A begins 20 units away from its intersection with B, GROPER can conclude that the 
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beginning of A is also the beginning of A’; none of the front of A’ has failed to show up in 

A. 

In general GROPER does this in the following way. When making an entry in the 
lookup table for a pair of edges, GROPER records the minimum and maximum possible 
distance between the start of the first edge and the intersection point of the two edges. 
GROPER also records the length of the first edge. Call these three values min-d, max-d, 
and obj-len, respectively. When looking up a pair of edges that matches this pair, GROPER 
will know the minimum distance between the first edge in the image and the intersection 
point, as well as the length of the first edge. We will refer to these values as min-im and 
im-len. GROPER can now find the minimum possible length of any missing front part of 
the model edge with the expression: 

(max 0 (- min-im max-d)) 

On the one hand, the length of any missing front section of the edge can not be less then 0. 
On the other hand, this length also can not be less than the difference between the distance 
from the start of the image edge and the intersection point, and the largest possible such 
distance. For example, if the image edge starts 40 units from the intersection point, and 
the largest possible distance from the start of the model edge, considering error, is 30 units, 
then at least the first 10 units of the object edge must be missing from the image edge. 

Similarly GROPER calculates the largest possible length that could be missing from 
the front of the image edge with the expression: 

(min (- min-im min-d) (- obj-len im-len)) 

This distance can not exceed the difference in lengths between the two edges. It also can 
not exceed the difference between the distance from the start of the image edge to the 
intersection point, and the smallest possible such distance for the object edge. 

GROPER then combines these results to check the consistency of a match between a 
group of image edges and a collection of object edges. Suppose GROPER wants to find the 
model edges that might match the image edges (ABC D). GROPER pairs edge A with each 
of the three other image edges, and performs a table lookup for each pair. With each match 
to a pair of model edges, GROPER determines a range of possible amounts missing from 
the front of A. Then when GROPER combines some matches, it intersects these ranges. If 
the intersection is null, GROPER knows that those matches are not compatible. GROPER 
finds the matches for (A B CD) by repeating this process using B, C and D in turn as the 
first edge, pairing each with the remaining three edges. GROPER finds the sets of model 
edges that match these image edges and provide a consistent set of values for the amount 
missing from the first image edge. It takes the intersection of all these sets of matches, 
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producing the matches for which a consistent location exists for all the image edges with 
respect to the corresponding model edges. 

This approach requires that GROPER combine the results of performing a separate ta¬ 
ble lookup for every ordered pair of edges in a group. Furthermore, for each pair GROPER 
actually takes the intersection of two lookups, using first the minimum and then the max¬ 
imum distances to the intersection point. At the cost of some extra run-time processing, 
this provides greater accuracy of indexing, while reducing space requirements. 

6.2.2 Parallel Edges 

The parameterization discussed in the previous section is not suitable for parallel or nearly 
parallel edges. Parallel edges have no intersection point. So if we allow for error, nearly 
parallel edges will produce an infinite range of possible distances to an intersection point. 
This section discusses a second parameterization used for almost parallel edges. 

When making table entries for a pair of edges, GROPER does not decide to exclusively 
use one set of parameters or the other. Rather, for errors in the angle of the edges that would 
make them nearly parallel, GROPER calculates the possible values of the parameterization 
discussed in this section. For errors in the angles of the edges that do not make them 
nearly parallel, GROPER uses the parameterization discussed in the previous section. For 
example, suppose two edges have an angle of GROPER considers two edges nearly 

parallel if they are within an angle of pj of being parallel. However, allowing for error, these 
two edges may form an angle anywhere from to GROPER breaks this range up 
into three subranges: (fo’io)’ ant ^ (io’®)- F° r the first two subranges, GROPER 

makes entries using the non-parallel parameterization of the previous chapter. But for the 
last subrange, GROPER calculates the possible values that a different set of parameters 
may have, as this section will describe. 

For nearly parallel edges, GROPER uses the following five parameters: the angle be¬ 
tween the edges, the minimum and maximum distance from the middle of a fragment of 
the first edge to any point on the second edge in the direction tangent to the first edge, 
and the minimum and maximum distances in the direction normal to the first edge. These 
five parameters do not fully specify the relationship between the two edges, because they 
do not indicate the length of the first edge. However, this choice of parameters does allow 
GROPER to fit these five parameters into a three dimensional array, just as it did with the 
parameters for non-parallel edges. Furthermore, since GROPER puts the length of the first 
edge into the table with each entry, it can confirm that no matched pair of image edges has 
a first edge longer than the matching first model edge (taking possible error into account). 

GROPER calculates the range of possible values for these parameters in much the 
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same way as it did for the non-parallel parameterization. It also uses a similar method for 
enforcing the global consistency of matches. However, parallel image edges do not always 
provide much information about the location of the image edge within the object edge. 
If, for example, we have two parallel model edges, ((0 0) (0 10)) and ((5 0) (5 10)) that 
produce two image edges ((17 1) (18 1)) and ((17 6) (18 6)), we can not tell anything about 
what part of the model edges the image edge fragments come from. 

To enforce what consistency it can, GROPER stores in an entry the minimum and 
maximum possible distance from the midpoint of a fragment of the first edge to the second 
edge, in the direction of the tangent of the first edge, along with the length of the first edge. 
GROPER then applies exactly the same procedure as it used for non-parallel edges. 

Thus the parameters chosen for almost parallel edges allow GROPER to build and 
use the index table in the same way for parallel and non-parallel edges. In both cases 
GROPER uses the same three dimensional array, combining the results of two separate 
lookups. This is possible because, in both cases, four of the parameters actually represent 
two ranges of possible values, which may vary when we take fragments of the model edges. 
GROPER can also use the same procedure to enforce consistency in both cases, because each 
parameterization uses a value that varies according to the length of the section of the front 
of the model edge that fails to show up in the corresponding image edge. However, because 
the parameters discussed in this section do not fully describe the relationship between the 
two edges, some inefficiency may result. The initial table lookup may produce some pairs of 
edges in which the first edge is too short to match the first image edge. Although GROPER 
weeds these out using the length of the model edge as stored in the table entry, this process 
may take some extra time. A more complete parameterization might allow GROPER to 
never have to consider these matches at all. 

0.2.3 Evaluation 

First of all, indexing will produce incorrect matches because it makes many simplifications. 
For example, GROPER puts table entries into buckets that cover a significant range of 
values. So GROPER may find that a pair of image edges produce a parameter value of 1, 
and use that value to look up pairs of model edges in a table. If buckets have a length of 20, 
then in the same bucket of the table that has the correct pairs, GROPER may also find a 
pair of edges that could not produce a parameter value of less than 19. Mistakes may also 
creep in because GROPER, for a particular range of angle values, calculates the maximum 
and minimum values that a pair of edges might produce for the other parameters. However, 
while a pair of edges with an angle between them of | might produce a value of 10 for one 
parameter and a value of 20 for a different parameter, it might not be able to produce both 
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values at the same time. Yet the table entry will not indicate that. Chapter 9 will discuss 
tests that show how frequently these errors occur, and argue that they will not have much 
effect on GROPER’s overall performance. 

Secondly, as mentioned before, GROPER’s approach to indexing seeks to simplify the 
enforcement of global consistency, and the building of a three-dimensional array that de¬ 
scribes all possible relations between fragments of edges. It does this at the expense of 
some ease of implementation. But more importantly, this approach to indexing requires 
combining the results of a large number of table lookups performed on a fairly densely 
packed table. This results in greater run-time costs. 

The amount of time GROPER takes to perform indexing with this scheme will depend 
mainly on the density of the lookup table. To find matches for a group of image edges, 
GROPER performs the following pseudo common lisp code fragment: 

(let ((consistent-matches-list ; list of matches with 

(loop for edge in group-of-edges ; consistent location 

collecting ; for a single edge. 

(let ((matched-pairs 

(loop for edge2 in (remove edge group-of-edges) 
collecting 

(intersect (lookup-min-values edge edge2) 

(lookup-max-values edge edge2))))) 
(combine-into-consistent-matches matched-pairs))))) 

(intersect-matches consistent-matches-list)) 


The function “intersect” takes the intersection of the matches found in the table for the 
minimum parameter values a pair of image edges produces, and for the maximum values. 
If a group has M image edges, GROPER must perform this function M(M - 1) times, 
with a cost that increases linearly with the number of values in an average cell looked up. 
“Combine-into-consistent-matches” takes all the pairs of model edges that match the image 
edge pairs and, for one of the image edges, constructs all groups of matches that imply a 
consistent location for that one image edge. GROPER must perform this M times, with 
the cost of each related to M times the number of matches for a typical pair of image 
edges. Finally, “intersect-matches” intersects the M results of this routine, with a cost 
proportional to the number of sets of model edges that consistently match the image edges 
for the position of at least one of the image edges. 

What does this analysis tell us? First of all, we can measure the cost of using a three 
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dimensional array instead of a five dimensional array as ( M)(M - 1) times the typical 
number of entries in one cell of the array. However, as the number of objects in the model 
library grows, we expect this density to grow linearly. This means that this indexing scheme 
will have an 0(M 2 N) cost, for N objects modeled in the library. Chapter 9 provides more 
specific information about experiments that measured the density of the lookup tables. 

Secondly, we see that the way to reduce the cost of indexing in general is to reduce the 
density of the lookup table. Most of the costs of indexing depend on how many matches turn 
up at each stage. The cost of “intersect” depends on the number of entries in a cell. The cost 
of “combine-into-consistent-matches” depends on the number of model pairs that match 
the image pairs. Only “intersect-matches” depends mainly on the number of groups of 
model edges that will match the group of image edges in question. With accurate grouping, 
a recognition system should not need to perform too many indexing steps. Nonetheless, 
for large libraries the indexing described in this chapter may prove too slow. The following 
section describes some ways to speed up indexing by using a more sparse lookup table, at 
the expense of some space. 


6.3 Alternate Approaches to Indexing 

In indexing, a fundamental trade-off exists between space and time. To speed up indexing 
we would like to use a less dense lookup table, and require fewer steps to combine the results 
of indexing. We can accomplish both these goals by performing indexing based on triples 
of edges instead of pairs. A straightforward implementation of this idea will require a lot 
more space, however, so we will also suggest some ways to address this space problem. 

Suppose we built a lookup table with entries that describe the possible relationships 
between every triple of model edges. Such a table would have two key advantages. First of 
all, it would remove the need to perform extra lookups to check the global consistency of 
matches. Secondly, it would create a much sparser table. 

When a pair of non-parallel image edges match a pair of model edges, this match 
determines the rotation and translation needed to bring the model into alignment with the 
object in the image. Global inconsistency arose with matches between a series of pairs of 
edges because the rotation and translation implied by one match might differ from the one 
implied by another match. Using triples of edges instead of pairs, however, we can insure 
that each triple has two edges in common with one other triple. If these two edges match 
the same two model edges in a match for each triple, then the two matches must actually 
require the same rotation and translation, implying a global consistency between all the 
edges in the two matches. 

An example will make this clearer. Suppose we have a group of image edges, A, B, C, 
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D. We use the parameters that describe edges A, B, and C to look up in a table, finding the 
matching model edges A’, B’, and C’. A second lookup reveals that B’, C\ and D’ match 
B, C, and D. In both matches, B and C are matched to B’ and C’. Matching B and C to B’ 
and C’ fully specifies the location of the object in the image. So this location must be the 
same for the first match as it is for the second. And the location of the object that aligns 
A with A’ must also align D with D’. In contrast, when we used pairs of edges, A and B 
might match A’ and B’ for one location of the object, while B and C matched B’ and C’ for 
a completely different location of the object. So those two matches between pairs of edges 
would tell us nothing about the compatibility of A matching A’ with the match between C 
and C’. 

Some potential problems still exist for this approach, which has not been implemented. 
Because a match between two parallel image edges and two parallel model edges do not 
fully determine the location of the model in the image, we should order these lookups so 
that no two triples have only two parallel edges in common. And a single lookup specifies 
the location of the object model only to within some error bounds. Over multiple lookups 
these errors might accumulate, so that the location of the object implied by the first match 
differs slightly from the location implied by the second match, which again differs a bit from 
the location implied by the third match. This might result in the first and third matches 
differing by more than we would like to allow. However, it seems probable that we can 
implement an indexing scheme using edge triples, which will eliminate most problems of 
global inconsistency. And, while GROPER’s indexing approach requires that to find the 
matches for a group of N edges it must find the matches for ( N)(N — 1) pairs of edges, 
using triples it would only need to consider N - 2 sets of edges. 

Edge triples have the second advantage of creating a sparser lookup table. So that we 
may easily compare the sparseness obtained by using triples with that obtained with pairs, 
let us forget about the trick that allows GROPER to use a three-dimensional table, and 
suppose that a single lookup will produce only model edges that could have produced the 
two or three image edges. This means using a five dimensional array for pairs of edges, and 
a nine dimensional array for triples. (Nine parameters suffice to capture the relationship 
between three edges. We could use two angles, the location of the centers of two edges 
relative to the center of the third, and the lengths of the three edges). The run-time cost 
of each indexing scheme will depend on how many matches a single lookup produces, since 
we must then take the intersection of a series of such lookups. 

Our library of object models will typically have fewer triples of edges that match three 
image edges than pairs that match pairs of image edges. To argue this, we may think of 
a triple as consisting of a pair of edges plus one extra. We then consider whether adding 
the extra edges creates more matches than the pair alone had, or fewer. For any pair of 
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model edges that match the first two image edges, we will usually find at most one model 
edge that could have produced the third image edge, and often we will find none. This is 
because matching the first two image edges will determine a specific location of the object 
model in the image, unless the edges are parallel. Given that single possible location of the 
object model, only one model edge will align with the remaining image edge. Allowing for 
error may occasionally produce more than one possible match. But on the other hand, if 
the first two image edges did not really come from an object that they happen to match, 
we can expect that quite often that object will not have any match for a third image edge. 
Chapter 9 shows some experiments that indicate the amount of a reduction in processing 
we could achieve using edge triples instead of doubles. 

The use of edge triples instead of pairs offers the potential for significant speed-ups 
in the time required for indexing. However, it would also require significantly more pre¬ 
processing time and space. While a model with N edges has only 0(1V 2 ) pairs of edges, it 
has 0{N 3 ) triples of edges. This might make the use of triples counterproductive when N 
grows large enough so that the lookup table fills available primary memory. However, we 
have a variety of possible compromises at our disposal. 

We could use two lookup tables, one for pairs of edges and one for triples. This allows 
us to save space if we put every pair of edges in the first table, but only some of the triples 
in the second. This will work as long as we can tell at run time in which table to look. 

For example, because grouping tends to lump together edges that come from single 
convex sections of an object, we could make entries in the triples table for all triples of 
edges for which two of the edges come from the same convex section of the object. Then 
when we form a group of image edges that appear to have two edges from one convex section 
of an object, we could look up all the triples in this group in the triples table, not using 
the pairs table at all. We would only need to use that table if we had a group of edges for 
which all edges seemed to come from convex sections that produced no other edges in the 
image. 

Alternately, since grouping favors nearby edges, we could make entries in the triple table 
when two, or maybe all three of the edges could produce fragments sufficiently close to each 
other. Then, when looking up a group with edges near each other we could use the triples 
table, using the pairs table otherwise. 

In a sense Lowe’s system SCERPO [18] has already used the idea of creating special 
lookup tables geared towards the characteristics of a grouping system. It performs grouping 
based on certain properties, such as parallelism or co-termination. Then SCERPO indexes 
into a table containing only those edges that have these special characteristics. For example, 
if SCERPO finds two parallel edges, it performs an indexing step to produce all pairs of 
parallel edges in the object model for which it is looking. This immediately produces a set 
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Chapter 7 


Verification 


When grouping and indexing have found a collection of image edges that could only match 
a limited number of sets of model edges, GROPER uses verification to decide which if any 
of these matches to accept. The verification process takes a match between image edges and 
model edges, and determines a rotation and translation that will bring the object model 
into alignment with the image edges when placed in a common coordinate system. If no 
such rotation and translation exist, then GROPER knows that this supposed match does 
not really work. If GROPER does find a rotation and translation, it uses them to look 
for other image edges that can align with edges of the model. Finally, based on all the 
matching image edges, GROPER decides whether the image provides enough support for 
this hypothesized position of an object. If it does, GROPER accepts the hypothesis and 
removes all matched edges from the image before looking for more objects. 

This process presents three problems. First, how to find a rotation and translation 
based on some matches between edges. Second, how to look for more edges based on this 
rotation and translation. And third, how to decide if a match provides enough evidence 
for the existence of an object. GROPER does not do anything original in addressing these 
questions. So we will discuss these questions only briefly, referring the reader to previous 
work for more detailed information. However, we must also consider two other issues. 
Before describing how GROPER performs verification, we explain why GROPER needs 
verification when it already has an indexing module. And after describing the verification 
module, we evaluate it in light of the kinds of problems that prompted the development of 
GROPER in the first place. 
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7.1 Why Verification? 


GROPER needs a verification module for two reasons. First of all, as chapter 6 explains, 
indexing may make mistakes, and come up with some sets of model edges that could 
not really have produced the image edges used to index. Secondly, a hypothesis driven 
verification step offers a more efficient method of finding additional support for a match 
then does further grouping and indexing. 

Finding a correct correspondence between some image edges and modeled object edges 
does not completely solve GROPER’s recognition problem. If GROPER finds that a group 
of image edges could match a few different sets of model edges, GROPER still does not 
know which of the matches is correct. Furthermore, GROPER must consider the alternate 
hypothesis that the image edges do not all really come from a single, known object, but 
just coincidentally match up with a few known objects. Before having to decide between 
these alternatives GROPER would benefit by knowing the additional edges in the image 
for which any of the proposed matches might account. If GROPER finds that one of the 
proposed matches would also account for several other edges in the image, while the other 
matches derive no additional support, this might lead GROPER to prefer the first match 
over the others. Also, once GROPER succeeds in recognizing an object, it needs to remove 
from the image all the edges that object produced, so that it does not use any of them in 
recognizing different objects. 

We could imagine looking for additional edges that fit our matches using grouping, but 
that would not produce a quick and certain answer. Even if some edges do not combine 
well with the image edges in the current group, we still want GROPER to find them if 
they would align with one of the matched objects. And once we have a small number of 
hypothesized matches, it is easy to see where those matched objects would appear in the 
image, and compare their proposed position with all the image edges. After doing that, we 
know for certain that GROPER has found all image edges that might have come from any 
of the matched objects that indexing has suggested. 


7.2 Computing a Rotation and Translation 

To perform verification, the first thing that GROPER must do is to determine what a match 
tells it about the location of an object. As mentioned in chapter 6, once we have a mapping 
from at least two non-parallel image edges to two edges in an object, this unambiguously 
tells us the location of the object in the image. However, due to error, we do not expect 
that all the matches between image and model edges will indicate exactly the same location. 
So this section will describe a technique for combining all of the data provided by a match 
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to determine a rotation and translation of the object model that brings it into a close 
alignment with the corresponding image edges. The author adapted the code in GROPER 
that performs this task from code in the system RAF [9] of Grimson and Lozano-Perez. 
Their paper contains a more complete description of this process. 

This step of verification proceeds in two stages: determining first a rotation and then 
a translation. We can view the problem as one of finding a three degree of freedom trans¬ 
formation that aligns the two sets of edges, which lie in different coordinate systems. This 
transformation will consist of a rotation, which gives the two sets of edges the same normal 
vectors, and then a translation in x and y coordinates, which causes the image edges to lie 
on top of the model edges. 

To find the desired rotation, GROPER simply averages the rotation that would give 
each model edge the same normal as its corresponding image edge. If two rotation angles 
differ by too much, then we know that the edges do not have a compatible rotation, and 
so do not really form a valid match. GROPER makes this decision if two edges produce 
rotations that differ by more than Otherwise, GROPER weights the rotations each pair 
of edges indicates by the lengths of the image edges, under the theory that longer image 
edges provide more reliable data about the angle of the edge. GROPER then determines 
the rotation for this match by taking the normalized average of these weighted angles. 

Using this rotation, GROPER then finds the translation that best aligns model edges 
with image edges. To understand how GROPER does this, we must first understand 
that a single match between an image edge and model edge provides only one of the two 
components of translation. To find the translation, we first rotate the model edge about 
the origin, using the value computed according to the above description. Because the image 
edge may correspond only to a fragment of the model edge, we may have a range of different 
translations available that would each align the image edge with a portion of the model 
edge. However, regardless of which part of the model edge we match to the image edge, the 
translation will have the same component in the direction of the edges’ normals. Differences 
in the component tangential to the edges will cause the image edge to align with different 
parts of the model edge. So if we have two matches involving non-parallel image edges, 
we can find the component of the translation in the direction of two different normals, and 
solve for it. If we have more than two matches, we can find the translation indicated by 
every pair of matched edges, and average them. After this, GROPER checks to see that 
the proposed rotation and translation really does bring each model edge close enough to 
the corresponding image edges. Grimson and Lozano-Perez [9] contains a more detailed 
description of this process for the three dimensional case. 
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7.3 Finding Additional Support 


Once GROPER has calculated the rotation and translation, it can use them to look for 
extra support for the object. GROPER does this quite simply. It calculates the proposed 
location of each object model edge in the image, and then compares this to each image edge, 
to see if the two match. They match if the difference in their angle is less than allowed 
error bounds, and no part of the image edge lies further away than allowed by error bounds 
from some part of the model edge. If they match, GROPER adds this pair to its list of 
matched edges. 


7.4 Enough Support? 

Having found all the image edges that could match an object model in a particular location, 
GROPER must decide whether to accept this match. GROPER will accept a match if it 
accounts for an arbitrarily chosen amount of an object’s perimeter, and if it does not conflict 
with any other acceptable match based on the same initial indexing step. GROPER will 
reject any match for which the lengths of the image edges do not add up to at least 25% of 
the perimeter of the object matched. We have chosen 25% because that amount seems to 
produce good results. If no matches pass through verification, then of course GROPER does 
not accept any matches, and returns to the grouping step to continue looking for objects. 
If one match passes this test, GROPER accepts it as valid. If more than one match passes 
verification, then GROPER compares these matches. If they all use the same image edges, 
then GROPER decides that it has correctly found all the edges for a single object, but 
that the image does not provide enough information to tell it which object. In this case, 
GROPER removes the edges from the image before looking for new objects, and notes 
from which objects these edges might have come. If several matches account for slightly 
different sets of image edges, however, then we have no reliable way of preferring one match 
over the others. Furthermore, accepting an incorrect match would mistakenly remove some 
edges from future consideration, perhaps making it difficult to find other objects. In this 
situation, GROPER does not accept any of the matched objects as correct, and instead 
returns to looking for other objects. 

7.5 Problems with Verification 

Although section 7.1 argued that GROPER needs a verification step, this step does present 
some problems. Verification may cause GROPER to incorrectly match some image edges 
to an object model. This can cause two types of problems. First of all, these incorrect 
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matches may cause GROPER to falsely recognize an object without really having suffi¬ 
cient evidence. But more seriously, mistakes in verification may remove from consideration 
edges that GROPER will need to recognize other objects in the scene. We can not ex¬ 
pect any recognition system to avoid all mistakes. It is always possible for several objects 
to produce edges that between them look exactly like some other object. But the verifi¬ 
cation system discussed in this chapter falls prey to some particularly annoying mistakes 
that seem avoidable. In particular, this section will provide examples of mistakes that 
contradict previously discussed ideas of good grouping, and mistakes that do not provide 
consistent matches. Huttenlocher[13] provides a more formal analysis of some approaches 
to verification and their likelihood of producing errors. 

GROPER attempts to avoid mistakes by using grouping to only consider collections of 
edges that appear to have all come from a single object. The verification step, however, 
offers no such guarantee. Verification may find that the modeled object aligns well with 
some image edge which does not group well with the other image edges matched. In figure 
7.1, for example, edge A lies close to the proposed location of a recognized object. However, 
to a person it seems obvious that edges B and C also came from whatever object produced 
edge A. And edges B and C do not lie near the modeled object. Such evidence would allow 
people to avoid adding edge A to the proposed match. GROPER’s grouping system also 
judges that edges A, B, and C probably came from a single object. But we do not have 
enough confidence in this grouping system to allow it to rule out matches like this. Often 
GROPER groups together edges which did not come from a single object. These types of 
grouping mistakes do not cause serious problems when GROPER uses grouping to order a 
search. But if verification made use of the grouping system, many new errors would result. 

Verification also makes errors because image edges may all align with an object model 
to within error bounds, but still produce global inconsistencies. For example, in figure 7.2, 
two parallel image edges both lie close to a modeled object, in its proposed site. Either 
edge would provide a good match to the object, but not both. In general, we would like to 
avoid matching two sections of image edges to the same section of a model edge. Detecting 
this mistake can be tricky, however, because an image edge can match a range of different 
locations on a model edge, depending on how the sensing error has affected the location of 
the edge. Figure 7.3 provides an example of two image edges which, in one interpretation 
match the same section of a model, but still offer a consistent interpretation. It also seems 
difficult to decide how to respond to such a mistake when we detect it. In figure 7.2, for 
example, we would like a recognition system to accept only one of the two parallel edges as 
a match, but we have no way to decide which one to accept. 

Glaring mistakes also occur when image edges lie close enough to model edges to match 
them, but do not have the right relationships to each other. Figure 7.4 shows three image 
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Figure 7.1: Dotted lines indicate the proposed location of an object. A verification step 
will decide that this object produced edge A, but not edges B and C, even though the three 
edges appear to have all come from the same object. 



Figure 7.2: Dotted lines indicate the proposed location of an object. The two parallel lines 
on the right both lie close to it. In such cases, it can be hard to know which is the right 
match. 


Ill 
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Figure 7.3: In this figure, matching the two right most image edges to the same model 
edge may create a conflict. But, allowing for error, we can also match them to completely 
different sections of the edge. 

edges which lie close to a modeled object. Error bounds do not prevent GROPER from 
matching any of these edges to the object model. But they do not look right. Perhaps 
we should also constrain the image edges to have the same order as the model edges they 
match, but it can be difficult to determine the order of unconnected image edges. Or we 
might try to avoid this type of problem by requiring that nearby edges have similar sensing 
error which brings them into direct alignment with the object model. This would rule out 
interpretations like the one in figure 7.3. In that example, two image edges can match a 
model edge to within error bounds, but we would expect that if the two image edges really 
came from the same object edge, they would look more co-linear in the image. In addition 
to the problems involved in detecting mistakes due to this kind of global inconsistency, we 
also have a difficult problem in deciding which matched edges not to use in order to produce 
a smaller, more consistent match. 


7.6 Conclusion 

GROPER does not make much of a contribution to this topic. Instead, it makes use of the 
type of verification performed by existing recognition systems. Because GROPER operates 
in a domain in which false positive recognitions can cause problems, it does bring to light 
some difficulties with existing verification techniques. In particular, when we do not assume 
knowledge of which side of an edge is figure, and which is background, it frequently happens 
that edges match a correctly located object, even though the object did not really produce 
those edges. A recognition system which attempts to recognize many objects in the same 
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Chapter 8 


Previous Work 


8.1 Introduction 

This chapter discusses how previous work in computer vision relates to GROPER. It has 
two primary functions. First of all, it will acknowledge work that inspired many of the ideas 
in this thesis, while making clear what aspects of GROPER are novel. And secondly, the 
experience gained with GROPER can help us to understand why previous object recognition 
systems have worked well in their domains. 

GROPER differs from past recognition systems primarily in the way it performs group¬ 
ing, and in the way it uses grouping. In retrospect, we can see that many recognition 
systems use grouping to eliminate from consideration all but a small class of combinations 
of edges in an image. GROPER differs from these approaches by using grouping to impose 
an order on the set of edge groups, instead of just using grouping to make a yes or no deci¬ 
sion about whether to consider a collection of edges. This explains why GROPER needed 
a mechanism for estimating the probability that any collection of edges originated in the 
perimeter of a single object. Because they mostly used quite simple grouping methods, 
and because they did not want to risk missing too many objects, recognition systems that 
have used grouping to make a yes or no decision about what groups of edges to consider 
have said “yes” to a lot of groups. As we have seen before, an unordered search through 
a large number of sets of edges leads to a combinatorial explosion and many false positive 
identifications when dealing with complex recognition tasks. 

This chapter will discuss four types of previous work. First of all, many scientists have 
worked on the problem of segmenting images into relatively homogeneous component parts. 
Section 8.2 will briefly discuss that work. Although not directly relevant to the approach 
taken by GROPER, this work may have relevance to possible extensions to GROPER. 
Section 8.3 will discuss past work on the perceptual organization of edges. This work 
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has provided important motivation for the grouping system developed in GROPER, both 
by pointing to important grouping phenomena, and by suggesting ways to explain these 
phenomena. Thirdly, we will discuss some previous approaches to the problem of object 
recognition. Only Lowe has explicitly made use of grouping in a recognition system, and 
his work has provided inspiration for GROPER. But we will also see that we can view 
many other recognition systems as containing implicit grouping components. Finally, we 
will examine other approaches taken to the problem of indexing into a library of objects, 
or otherwise making use of such a library. Most indexing systems either assume an already 
segmented image, or can handle only special collections of edges, such as connected edges, 
or unoccluded edges. GROPER’s indexing system differs from others mainly in that it 
can deal with arbitrary collections of edges. This section will also discuss why the use of 
indexing makes grouping particularly important. Throughout the chapter we will stress 
that other work on grouping has provided important motivation for GROPER, but that 
GROPER uses grouping in quite a different way from these other systems. 


8.2 Segmentation 

Much work in computer vision has focused on the problem of segmenting images. The goals 
of that work, although superficially related to this thesis, actually differ from it quite a bit. 
Most work in image segmentation seeks to break an image up into homogeneous parts. 
It does not help to tell whether two parts, separated in the image, actually come from a 
single, unoccluded object. Nor does it help us when several different materials make up the 
surface of a single object. Traditional segmentation attempts to divide an image based on 
the materials in it, rather than the objects. 

Haralick and Shapiro[12] survey some segmentation systems that attempt to divide im¬ 
ages into sections of relatively homogeneous intensity levels. For example, as a simple ap¬ 
proach one can find clusters in an intensity histogram of an image, and then form segments 
from the connected pixels whose intensities all fall within the same cluster. See Carlotto[6] 
for a recent approach along these lines. Alternately, the split and merge approach breaks 
apart segments that are not sufficiently homogeneous, and combines similar segments (see, 
for example, Pavlidis[21]). Researchers have developed many other approaches as well. For 
our purposes, the point is that these approaches to segmentation only produce segments 
that do not have too great a variation in their intensity. 

Three reasons make these types of approaches inapplicable to GROPER without signif¬ 
icant modification. First of all, they do not directly provide the information that GROPER 
needs, the likelihood that two unconnected segments in the scene originated in the same ob¬ 
ject. Secondly, in principle, these methods cannot handle objects made from more than one 
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material. A good segmentation system wants to separate the wooden handle of a hammer 
from its metal head, and will not tell us anything about the likelihood of the two segments 
they form coming from the same object. And thirdly, work on segmentation has focused on 
deter minin g a single, optimal segmentation of the scene. Many applications require such 
an output. But GROPER’s approach to recognition assumes that any single decomposi¬ 
tion of a scene will probably contain mistakes. So GROPER requires an ordering of the 
space of all possible segmentations. After getting nowhere with the best segmentation of 
a scene, GROPER wants to know the second best segmentation, so it may try that next. 
For this purpose we need an approach to segmentation that can assign a probability to any 
segmentation, or partial segmentation of an image. 

However, a more sophisticated grouping system might make use of some previous work 
on segmentation. First of all, we might use segmentation to derive good primitives for 
grouping. A conservative segmentation algorithm might produce, with high reliability, small 
sections of the image extremely likely to come from the same object. Such a system would 
only fail when two adjacent or occluding objects have similar intensity levels, which may 
not happen often. These primitives might prove as reliable as the nearly connected convex 
edge sections used by GROPER, and more informative, because they allow the extraction of 
intensity and texture information. Secondly, segmentation work has developed machinery 
for determining whether the intensities of two regions of an image indicate that they come 
from the same object. The split and merge approach, for example, must make a judgment 
about whether two regions of an image are similar enough to justify merging them. One 
might be able to modify this work to provide a probability, instead of a strict yes or no 
answer. 

Although they do not answer the questions GROPER needs to ask, segmentation sys¬ 
tems do provide much insight into the use of intensity information in grouping. As men¬ 
tioned in chapter 3, possible extensions to GROPER would find such insights quite valuable. 

8.3 Perceptual Organization 

Much work by psychologists and computer scientists has investigated human grouping phe¬ 
nomena. This work has contributed to the ideas in this thesis primarily by suggesting 
explanations for grouping phenomena, and by identifying various grouping phenomena. 

Two explanations of grouping have particular relevance to computer scientists attempt¬ 
ing to use grouping phenomena in a recognition system. First of all, it has been suggested 
that people tend to group together parts of an image particularly likely to have come from 
a single object. This thesis focuses on that type of grouping phenomena. It has also been 
suggested that people form groups that will prove particularly useful to the remainder of 
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the recognition process. In this view, one does not arbitrarily group any edges coming 
from the same object, rather one groups to identify a useful relationship between some of 
the edges in an object. These views do not contradict each other. One may explain some 
grouping phenomena in one way, and other phenomena in the other way, or consider that 
both factors contribute to certain grouping tendencies. 

Lowe[18] shows that grouping can form collections of edges particularly useful to the 
recognition of three dimensional objects. For example, grouping together parallel edges 
proves useful because, under certain viewing assumptions, parallel edges in an object appear 
parallel in any image of that object, regardless of its orientation. Once one has identified 
two parallel edges in an image, one need consider matching them only with parallel edges 
in the object. So one can also explain grouping as a way of creating good primitives of 
comparison for recognition. Biederman[l] also suggests using this type of grouping in a 
recognition system. 

The Gestalt psychologists, who first studied grouping, suggested that one groups to¬ 
gether parts of an image that tend to correspond to objects in a scene. Kohler in Gestalt 
Psychology[ 16], his review of the work of the Gestalt movement, points out that we group 
together adjacent surfaces with common surface properties, and that such surfaces have a 
greater likelihood of coming from a single object than do adjacent surfaces with different 
properties. Witkin and Tenenbaum[27] discuss the hypothesis that we group together things 
that have a relationship unlikely to occur by accident. These grouped image sections, they 
argue, probably come either from a single object or a single process, such as the symmetric 
streaks caused by a single windshield wiper. And Lowe [18] has attempted to demonstrate 
mathematically that parallel edges in an image have a greater likelihood of coming from a 
single object than do non-parallel edges. 

This thesis does provide somewhat more explicit reasons for suspecting that certain 
collections of edges come from a single object. Instead of just asserting that nearby edges 
in an image are likely to come from a single object, we attempt to provide an analysis of the 
image formation process to back up this hypothesis. This approach also provides a more 
quantitative analysis of grouping. 

The Gestalt psychologists pointed out a number of different grouping phenomena, in¬ 
cluding the distance constraint used in GROPER. However, past work has not taken notice 
of the orientation constraint, as we develop it here. Although it does not produce as strik¬ 
ing results as constraints discussed by the Gestaltists, informal psychophysics do support 
the hypothesis that people make use of a similar constraint in grouping. This constraint 
was motivated by a parts based view of objects, which others have investigated. Many 
researchers have suggested that we can simply represent objects as composed of a small 
number of convex parts. Whitman Richards particularly emphasized the relevance of this 
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view to grouping in personal communications. 

Finally, Lowe[l8] has emphasized the need for grouping to reduce the computation 
needed for object recognition. We also emphasize its value in reducing false positive iden¬ 
tifications of objects. 


8.4 Recognition 

Many computer systems have approached the problem of visual object recognition as a 
search through the space of possible mappings from image features to object model fea¬ 
tures. More recently, some researchers have recognized the importance of trying to perform 
grouping in a scene before referring to object models. However, a new look at even the more 
traditional recognition systems reveals that many of them contain an implicit grouping pro¬ 
cess to control the complexity of the search for objects. Besl and Jain[3] and Binford[2] 
provide good summaries of work on model-based object recognition. This section will re¬ 
view some of those systems with an eye towards understanding how grouping helped them 
work, and then describe how a more explicit grouping step has benefitted some recognition 
systems. 

ACRONYM [5] was one of the earlier model-based object recognition systems. It mod¬ 
eled objects as generalized cylinders, and then attempted to match model cylinders to 
possible projections of such structures in the image. For example, one might model an air¬ 
plane as having a cylindrical cabin. A cylinder can project in an image as either an ellipse, 
if seen from one end, or as a rectangular “ribbon” if seen from the side. ACRONYM first 
located possible ellipses, or ribbons in the image. It then considered the possible matches 
between these image features and the generalized cylinders of the object. This describes 
only a small part of ACRONYM’S capabilities, but we will focus on that part. 

ACRONYM performed grouping by looking for collections of edges that might corre¬ 
spond to a part of the object for which it searched. It might combine several edges into 
a ribbon, for example. This greatly reduced the amount of search ACRONYM needed to 
perform. Instead of a large number of edges in the image and the model, ACRONYM dealt 
only with a much smaller number of primitives, each of which contained several edges. 
Although effective in its domain, this type of grouping has two problems when applied 
to a more general domain. First of all, it depends on selecting a few primitives, such as 
generalized cylinders, as the parts of the objects for which the system looks. If one tries 
to recognize objects not made of such parts, ACRONYM’S method of grouping will not 
form collections of edges that all come from these objects. And secondly, ACRONYM’S 
grouping system appears to depend on an entire generalized cylinder being visible in the 
image. Brooks demonstrated ACRONYM working mainly on unoccluded objects, such as 
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airplanes on the ground viewed from above. If something occludes a part of an object, the 
grouping component may not form a primitive corresponding to that part. 

Bolles and Cain[4] use a distance constraint to reduce the amount of search needed by 
their Local-Feature-Focus method (LFF). After hypothesizing a match between an image 
feature and a model feature, LFF looks for other features in the image near this initial 
one. It then determines which model features these image features might match, and finds 
the largest mutually consistent subset of these collections of matches. LFF uses grouping 
by only considering a collection of features near each other as having potentially all come 
from a single object. This approach reduces the computation needed for recognition by 
greatly limiting the sets of matches between image and model features that LFF needs to 
consider. This type of grouping is also effective because, as we have seen, nearby features 
have a greater likelihood of coming from a single object than more distant features, so LFF 
focuses on the more likely hypotheses. However, LFF suffers from the disadvantages of 
any all or nothing grouping scheme, that is, a scheme that only decides it will or will not 
consider a hypothesis, instead of ordering the hypotheses. If LFF focuses only on sets of 
features quite close to each other, it runs the risk that an object will not produce enough 
nearby features in the image to make recognition possible. On the other hand, if LFF 
considers larger collections of features, the number of combinations of matches it must 
consider will grow. Since LFF does not order these combinations, it must consider them all 
before making a decision. Furthermore, Bolles and Cain realize that they can not expect 
a collection of nearby features to all come from the same object. So instead of looking for 
a match that accounts for all the features in a local cluster, they must look for a match 
that accounts for the largest subset of these features, a more computationally intensive 
task. GROPER avoids this problem by using additional grouping clues to form collections 
of edges hypothesized to all come from a single object. 

RAF, by Grimson and Lozano-Perez[10], performs grouping specific to the object it 
looks for, using a Hough transform. To avoid considering all combinations of matches 
between image and model edges, RAF begins with a clustering phase. It divides the space 
of all rotations and translations into buckets. Then, for each match, RAF determines the 
range of rotations and translations that would align the model edge with the image edge, 
and places a pointer to the match in all the buckets corresponding to potential rotations and 
translations. After doing this for all matches, RAF searches for an object by first trying the 
buckets that contain the most matches. Each bucket should contain only roughly compatible 
matches, and the search then finds the largest set of matches that passes a stricter test for 
compatibility. Although effective in its domain, this type of grouping does not work well 
when one has knowledge of a library of different objects. If one performs clustering for each 
object model, the clustering will take a lot of time, and may well produce a large number 
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of collections of matches to consider. This approach also does not help reduce false positive 
recognitions too much, because it does not prevent RAF from considering collections of 
edges not really likely to have come from a single object. For example, GROPER would 
not consider an interpretation that used only one edge in a square of four edges, because 
the four edges would form a primitive. RAF’s grouping technique does not prevent it from 
considering such possibilities. 

Lowe’s system, SCERPO, differs from the above systems in that it explicitly makes 
use of grouping. It selects small groups of edges that have a good chance of coming from 
a single object, and that provide good recognition primitives, as we previously described. 
Lowe relies on grouping to limit his search more than some of the previously mentioned 
systems, because grouping not only limits the number of collections of image edges his 
system SCERPO must consider, but it also allows SCERPO to only consider matching 
each group to a small number of sets of object edges. It does this because it knows that, 
for example, it need only match parallel image edges to parallel object edges, or three 
co-terminating image edges to three co-terminating object edges. This reduction in search 
allows SCERPO to handle the more complex domain of recognizing three dimensional 
objects in two dimensional images. However, after selecting a limited number of initial 
groups, SCERPO then considers all viable matches in an unordered search. So, SCERPO 
must perform an appreciable amount of search. 

In summary, ACRONYM, LFF, SCERPO and RAF all make use of grouping to limit 
the amount of search needed to recognize objects. And these methods work well in their 
domains. However, they all have the difficulty that, while they restrict the needed search 
to manageable proportions, they still require an undirected search through a relatively 
large space of possibilities. These approaches still leave open the problems of growing 
computational complexity when used with libraries of objects, and do not fully address the 
problem of false positive identifications of objects. 

Instead of using grouping to just limit search, some recognition systems attempt to use 
segmentation to completely separate the image of an object from its background. This 
approach obviously makes recognition much easier. It does not require a search through 
a set of possible segments in the image, and will not produce false positive identifications 
caused by attempting a match involving two sections of the image unlikely to have come 
from a single object. This kind of segmentation has only been done in limited domains. In 
those domains, however, segmentation can provide an effective first step in a recognition 
system. 

Segmentation can work well when one has access to three-dimensional image data. Yang 
and Kak[28], for example, present a system that separates the topmost, convex object in a 
pile, using three dimensional data acquired with a laser scanner. This presents a somewhat 
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simpler problem then segmenting two dimensional images, however, because one may look 
for discontinuities in depth or curvature. Two dimensional images do not make this type 
of information available. 

Gross and Rosenfeldfll] are working on the problem of segmenting objects in two di¬ 
mensional scenes for the purposes of recognition. However, they restrict their domain to 
one containing unoccluded objects of uniform intensity. 

Segmentation of general two dimensional images presents quite a difficult problem. For 
this reason, GROPER takes a search-based approach, which attempts to explore the most 
likely segmentations first, but does not depend on determining a single, correct segmenta¬ 
tion. 

Previous recognition systems have not noted the problem of false positive recognitions, 
which we have found to present a serious challenge. The use of some grouping in these 
recognition systems may partly explain the fact that they have not produced significant 
numbers of false positive identifications. However, we will now argue that some special 
characteristics of the domains of many object recognition systems have allowed them to 
avoid this problem. 

Three aspects of GROPER’s domain make false positive identifications a potentially 
significant problem. First of all, GROPER looks for any object in a library, and not just a 
single object. Secondly, GROPER does not assume that it can tell, before recognizing an 
object, which side of an edge is part of that object, and which side belongs to the back¬ 
ground. And finally, GROPER can deal with simple objects, and with scenes containing 
objects similar in appearance to the ones for which it looks. Each of these three factors 
make false positive identifications much more common in GROPER’s domain than in that 
of many other recognition systems. 

Obviously, the more objects one knows about, the greater the chances of mistaking 
some randomly chosen edges in the scene for one of those objects. We just wish to note 
that GROPER works with relatively simple libraries; false positive identifications will be¬ 
come a much greater problem with libraries of thousands of objects that have flexible or 
parameterized parts. 

Not knowing figure from background greatly increases the number of combinations of 
edges to consider because each edge has two possible interpretations. This increases the 
chances that some such combination will look like an object in the library. 

Finally, the objects for which a recognition system looks, and which it uses in scenes, 
can make false identifications particularly unlikely. A collection of edges from different 
objects have little chance of looking like a complex object, which has many edges. They 
have a much better chance of resembling a simple object, with few edges, like the ones 
used in GROPER’s object library. Furthermore, we have tested GROPER on images that 
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contain objects similar to some of the objects in the library. For example, if a recognition 
system knows about two objects with common subparts, it takes less of a coincidence for 
one of these objects to look like the other. Many recognition systems are tested in a domain 
in which they look for a single complex object, in a scene containing that object and a few 
very different looking objects. 

8.5 Indexing, and Libraries of Objects 

This section will discuss four other recognition systems that have performed indexing. 
GROPER’s indexing system makes use of some ideas already explored in these systems. 
However, GROPER’s indexing differs from these systems in ways that make it fit well with 
the type of grouping that GROPER uses. This section will also briefly discuss some other 
approaches to recognition that handle libraries of objects. 

Kalvin et. al.[14], Wallace[25], and Thompson and Mundy[23] use indexing in recognition 
systems. They all describe a collection of edges in an image with a small number of 
parameters that do not depend on the orientation of the edges. They then look in a table 
to find object models that could have produced those image edges. However, the way 
these systems work prevents us from applying them directly to the kinds of groups of edges 
produced by GROPER. The system of Kalvin et. al. identifies objects that might have 
produced a single connected section of contour. It does not provide a way of using the 
information contained in the spatial relationship between two sections of contour presumed 
to come from a single object, or make provisions for objects that do not contain distinctive 
sections of contour. Wallace does make use of this type of information. Although unknown 
to the author during the development of GROPER, Wallace’s approach is quite similar to 
GROPER’s, but it only handles indexing that uses two unoccluded sections of an object’s 
perimeter. GROPER makes use of occluded sections of perimeter, and can combine the 
results of many separate indexing steps to find a globally consistent interpretation for a 
group of edges. Thompson and Mundy have developed an indexing scheme that can use 
only pairs of vertices in an image. They need not handle occlusion, since they assume 
they will find unoccluded vertices. GROPER makes use of more general collections of 
edges. However, Thompson and Mundy designed their indexing system to recognize three 
dimensional objects. 

Kalvin et. al. have an indexing scheme particularly suited to identifying distinctive un¬ 
occluded sections of an object’s perimeter. They divide the curved contour of an object 
model into discrete sections. They then describe each section of contour using five param¬ 
eters that do not depend on the orientation of the object. These parameters describe the 
curvature of a short section of contour. Then for each such short section of perimeter in 
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each object model, they make an entry in a virtual five dimensional table which they actu¬ 
ally implement as a hash table. To identify which contours from known objects might have 
produced a section of image contour, they perform a table lookup for each short section 
of the image contour, and combine the results. They could just take the intersection of 
these table lookups to find a section of model contour that might have produced the entire 
image contour, but in fact they use a more complicated heuristic to handle noisy data. 
This system has two important advantages over the indexing GROPER performs. First of 
all, it can handle curved data directly, instead of first approximating it with straight lines. 
Straight line approximations can introduce many types of error in the recognition process. 
And secondly, its library only needs to contain a number of entries dependent on the length 
of an object’s perimeter. GROPER stores a number of entries proportional to the square 
of the number of straight line approximations to the perimeter. The chief disadvantage 
of Kalvin et. al.’s approach is that it does not use the relationship between disconnected 
image segments thought to come from a single object. GROPER’s grouping system focuses 
on producing just this kind of information. This limitation suits Kalvin et. al.’s approach 
to the recognition of objects, which expects to find connected sections of contour visible 
that will distinguish objects from most of the others in the library. 

Wallace combines indexing with a clustering technique. Wallace’s system uses indexing 
to allow each pair of contour segments in an image to cast a vote for each object that might 
have produced it. The system then searches first for the object that gets the most votes. 
Indexing also produces matches between image contour segments and model segments that 
can guide the resulting search. Wallace indexes using both straight line approximations 
to contours in the image and also curved arcs that appear in contours. We will consider 
only the indexing that uses straight line approximations, because it most closely resembles 
GROPER. 

Wallace indexes with straight lines in the image in much the same way that GROPER 
does, except Wallace assumes that these lines will correspond to unoccluded sections of ob¬ 
ject perimeter. So, Wallace represents two non-parallel straight lines with three parameters: 
the angle they form, and the distance from the mid-point of each line to their intersection 
point. An occluded edge will have a different mid-point than the same, unoccluded edge. 
This does not present a problem to Wallace’s system, which only plans on using pairs of 
unoccluded edges from the same object to assist in recognition. Wallace then builds a three 
dimensional table, containing an entry for every pair of edges in each object model. An 
indexing step can then tell the system which unoccluded pair of model edges could have 
produced a pair of edges in the image. This characterization does not capture all the in¬ 
formation in a pair of edges, because two edges of different lengths might have the same 
mid-point. Wallace only puts a single entry in the table for each pair of edges, resulting in 
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a sparse table. However, this means that Wallace’s system does not explicitly account for 
error. 

GROPER’s indexing scheme primarily differs from Wallace’s in that it handles occluded 
edges and accounts for error. Instead of making entries in a table for the way two unoccluded 
edges may look, GROPER makes entries for the way they may look given every possible 
occlusion of the edges and given all possible sensing error. This results in a more dense table, 
but allows GROPER to use occluded edges in its recognition process. Also, since Wallace 
uses only unoccluded edges, the need to check the results of indexing for global consistency 
does not arise. GROPER uses indexing to find all the globally consistent matches to a 
group of edges. Chapter 6 describes how GROPER does this. 

Thompson and Mundy[23] also build a table that shows all possible relations between 
a pair of object features. Given two vertices in a three dimensional object, they use six 
parameters to describe the relationships these vertices can have when projected into a 
two dimensional image. They then determine the values these parameters have when the 
object is looked at from all view points, sampling the space of all view points at five degree 
intervals. They then make a table entry for every view point. This results in a great many 
table entries, and may cause some inaccuracy if the appearance of a pair of vertices changes 
much between two view points. This type of approach does not apply to GROPER’s 
domain, since GROPER needs to handle only two dimensional objects, but must index 
using arbitrary collections of edges. 

Ettinger[7] takes an approach somewhat si mi lar to Wallace’s, based on object parts. 
Ettinger’s system decomposes each object in the library into sub-parts, and first looks for 
these sub-parts in the image. One sub-part may appear in many objects. The system then 
detects features in the image, such as corners or bumps, at a coarse scale. Each feature 
votes for the existence of all sub-parts that have that feature. The system then tries to 
recognize each of these sub-parts, starting with the ones that have received the most votes. 
Like the approach of Kalvin, et. al. Ettinger’s approach indexes using only sections of 
unoccluded contour from an object. Ettinger’s system also has some interesting properties 
that allow it to recognize classes of similar objects, to a limited extent, but this aspect of 
it falls outside the domain of this discussion. 

We will discuss two other systems that have approached the problem of using libraries 
of objects in recognition without using indexing. Turney et. al.[24] and Knoll and Jain [15] 
have used sub-template matching to locate contour sections in an image that match some 
contour sections of objects in the library. They then use these matches as a basis for finding 
objects in the image. We will particularly discuss the relevance of these approaches to the 
problems of accuracy and computational complexity. We previously argued that the use of 
object libraries will make these problems particularly serious. 
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Turney et. al. have described a recognition system that performs clustering on the 
results of sub-template matching. For each object in a library of objects, this system 
selects some sections of the object’s contour that particularly distinguish that object from 
others. Turney et. al. refer to these contour sections as sub-templates. Their system looks 
in the image to find all the sub-templates that appear as part of image contours. For each 
sub-template found, the system can determine the rotation and translation that would align 
any object model that has a similar contour section with the contour section in the image. 
So the system uses each sub-template match as a vote for a particular object’s presence at 
a particular orientation in the scene. If a sub-template matches only a few object models, 
it casts a big vote for each of those objects. If it matches many models, it casts a weaker 
vote for those objects. The system selects the model and orientation that gets the most 
votes. 

The amount of work required by this system will grow with the size of the library of 
objects. Presumably, the more objects one knows about, the more sub-templates one must 
look for in the image, if one wishes each sub-template to be salient, that is, to look like 
relatively few sections of object models. Turney et. al. seem to rely on using salient sub¬ 
templates to avoid having too many contour segments voting for objects that did not really 
produce them. The rate of growth, however, will depend on the particular strategy used 
for choosing sub-templates. 

For accuracy, Turney et. al. seem to rely on finding a significant number of image 
contour segments that could not have come from too many different objects in the object 
library. Their system can not combine information contained in unconnected segments in 
the image. One can imagine many real images in which combining this kind of information 
would greatly improve the accuracy of a recognition system. GROPER, for example, works 
using similar objects with very simple contours. A relatively long segment of one of these 
objects may consist of a single straight edge, or of two straight edges joined with a right 
angle. This type of domain can cause a problem for Turney et. al. because if a sub-template 
matches such a simple segment, it will have to vote for most of the objects it knows about, 
and if it does not, it will not use that segment to help it recognize objects. 

Knoll and Jain[l5] present a system that uses sub-template matching to generate a large 
number of hypotheses. They then perform a verification step on each hypothesis to locate 
the good ones. So they select sub-templates that match a large number of different objects. 
In fact, they want sub-templates to match 0(y/N) different objects, where N indicates 
the number of different objects that the library knows about. They select 0{VN) such 
sub-templates, so that each object can produce some sub-templates. They look for these 
sub-templates in the image. Each one they find provides them with a hypothesis for the 
location of 0(V~N) objects, and they attempt to verify each hypothesis. They then accept 
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the hypotheses that receive the most verification. 

This algorithm allows Knoll and Jain to deal elegantly with the computational com¬ 
plexity of the recognition process. Both the process of locating features, rind the process of 
verifying hypotheses will take only O(VN) time. Furthermore, since they can look for each 
feature independently, with a parallel architecture they could assign a processor to the job 
of looking for just one feature. With 0(y/~N ) processors, they could locate all the relevant 
features in time dependent only on the image size and the model size. Similarly, with that 
many processors they could perform all the verifications in time that depends only on the 
image size. Thus, their approach solves much of the combinatorial problems of recognition, 
particularly when combined with parallelism. 

On the other hand, although parallelism allows us to run Knoll and Jain’s algorithm 
in constant time, we might still want to determine the minimum amount of computation 
required to perform recognition, independent of the number of processors required. So 
their system does not completely solve the problem of performing computationally efficient 
recognition. 

Also, their system may encounter two problems relating to accuracy when applied to 
larger libraries of objects. First of all, the larger the library used, the smaller the percentage 
of objects in the library a sub-template should match. To find sub-templates that match 
a smaller percentage of objects in the library, the sub-templates will become larger. But 
as they become larger, the sub-templates will have less chance of appearing in the image 
unoccluded. And to recognize an object, the object must produce an unoccluded section of 
contour corresponding to a sub-template. This problem will become more acute if a library 
contains flexible objects, which can match a wider variety of small sub-templates. This type 
of problem will depend a good deal on the particular nature of the object library used, and 
so it is hard to predict its severity. But in general, it seems that large libraries of objects 
will present a problem to recognition systems that rely on finding reasonably distinctive, 
connected contours in the image. 

Secondly, Knoll and Jain’s approach does not address the problem of accuracy. Their 
approach relies on generating a large number of hypotheses, a number that grows with the 
size of the object library. But, as we have attempted to show, this will result in increasing 
numbers of false positive identifications. The more hypotheses we consider, the more we 
will find that coincidentally receive support from different objects in the scene. 

8.6 Conclusions 

This chapter has examined some other recognition systems from GROPER’s point of view. 
It points out that many successful recognition systems have used some type of grouping to 
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improve their performance. These systems provide additional evidence that some form of 
grouping can assist in the recognition process. We also argue that no one has presented 
a system that solves all the problems inherent in performing recognition using libraries 
of objects. Several systems present interesting ways of coping with the computational 
complexity of recognition, particularly the indexing approaches of Kalvin et. al. Wallace 
and of Ettinger, and the approach of Knoll and Jain. However these systems may not 
perform robustly on scenes containing objects with significant occlusion using large libraries. 
And these approaches do not address the problem of false positive identifications that large 
libraries of objects create. 
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Chapter 9 


Results and Conclusions 


9.1 Introduction 

This chapter will present the results of some experiments with GROPER, and draw some 
general conclusions regarding its performance. Empirical results will measure the speed and 
accuracy of three different aspects of GROPER, as well as the amount of storage GROPER 
requires. We will then attempt to characterize the kinds of problems on which we can 
expect a grouping system like GROPER’s to work well. Finally, we will make another 
attempt to argue that for complex recognition tasks we will need some type of effective 
grouping system, even if these tasks prove too difficult for GROPER’s grouping system. 

First of all, we will look at how well GROPER’s grouping system works in various types 
of situations. We test the grouping system on scenes containing untextured two dimensional 
objects with fairly uniform lighting. In such scenes, most intensity edges arise from occlud¬ 
ing contours. We also test the grouping system on scenes with three dimensional objects. 
These scenes also produce intensity edges resulting mainly from occlusions. Finally, we 
examine GROPER’s performance on more realistic three dimensional scenes that produce 
many different kinds of intensity edges. 

Secondly, we will measure GROPER’s indexing capabilities. We want to know how 
much space the indexing tables typically require. Also, we want to know about its accuracy 
and speed. GROPER’s indexing system ensures that, except for bugs, indexing will always 
produce all sets of model edges that might have produced a group of image edges. So, in 
terms of accuracy, we wish to know how often indexing will produce a set of model edges 
that will not align with the image edges for any rotation or translation of them. And, since 
indexing requires GROPER to perform several intersections, we wish to know the expected 
cost of these intersections as well as the possible benefits of other implementations. 

Thirdly, we will examine how much GROPER’s grouping system benefits it by looking 
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at its accuracy at recognition and the amount of computation it needs. We do this by com¬ 
paring GROPER’s performance to that of an alternate recognition system, SEARCHER. 
SEARCHER differs from GROPER only in that it uses a backtracking search instead of 
grouping to decide which collections of image edges to use in finding objects. 

Finally, we will conclude by considering the use of grouping in two different domains. 
We will argue that GROPER’s grouping system can greatly benefit a recognition system 
that operates in a domain where we have a method of locating the occluding edges in an 
image, and where we have somewhat limited occlusions. Then we will argue that, although 
GROPER’s grouping system will not perform well in more complex domains, it does provide 
a basis for implementing more elaborate grouping systems. And, we will maintain that some 
more elaborate grouping system will prove essential in building recognition systems with 
truly human performance. 


9.2 Results of Grouping 

9.2.1 Description of Tests 

GROPER uses grouping to order its search for objects. We can judge the grouping system 
by how quickly GROPER succeeds in recognizing objects, but we can also evaluate the 
grouping system on its own, by seeing how frequently the first groups that GROPER 
decides to investigate really all come from the same object. This will provide us with more 
precise information, and also allow us to try out GROPER’s grouping system on scenes 
that contain three-dimensional objects, which GROPER cannot recognize. 

In particular, we have examined the pairs of convex sections of edges that GROPER 
decides to combine together. Other grouping issues affect GROPER’s performance, since 
it often recognizes objects using groups that contain only one convex section, or more than 
two sections. But the rate at which GROPER can find correctly interpreted pairs of convex 
sections plays the major role in GROPER’s overall performance. Also, looking at just this 
problem, we can see how well the distance and orientation constraints work. 

We have tested the grouping system by looking at the thirty groups that GROPER found 
most likely to have come from a single object. We decide GROPER was right in choosing a 
group if all the edges in the group really did come from a single object. Furthermore, when 
GROPER forms a group, it makes a decision about which side of each edge the object lies 
on, and which side is background. So we only decide that GROPER has made a correct 
decision if it has also made a correct figure/background judgment for each edge. 

We ran these tests on six sets of images, of varying degrees of difficulty, which we will 
refer to as test sets one through six. Four tests involved five images each, the others used 
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Figure 9.1: The perimeters of the models used in the first set of tests of GROPER. 

two and three images. We tried GROPER on four sets of images made from polygonal, 
two-dimensional objects. Shapes cut out of black paper appeared on a white background 
in these images. GROPER worked best on these images because most of the edges in them 
came from the perimeter of objects, although a few edges would appear due to shadows or 
slight variations in the background. Figure 9.4 shows an example of one of these scenes, and 
figure 9.5 shows straight line approximations to the intensity edges in it. We made these 
tests using different assortments of objects. In test one we used the nine objects whose 
edges are shown in figure 9.1, and a couple of additional objects, unknown to GROPER. 
For test two we used scenes containing eight of the sixteen objects shown in figure 9.2. A 
third set of images contained all sixteen of these objects. And a fourth set used the six 
objects shown in figure 9.3. 

Next, in test five, we tried GROPER on scenes made up of real objects. These ob¬ 
jects also had uniform reflectance, producing no texture edges. And they appeared on a 
white background. But they were three dimensional, and curved. Because of their three 
dimensionality, lighting effects created many edges that did not come from the perimeter of 
any object, including edges from shadows, specularities, and changes in surface orientation. 
Figures 9.6 and 9.7 show examples of one such image, and the edges it produces. As we 
will see, GROPER performed less well on these images, probably due to the more complex 
edges they produced. 

Finally, for test six, we tried GROPER on two images of real-world scenes, containing 
all sorts of textured objects, and varying lighting effects. GROPER did least well on 
these images, and, in fact, even a person may find it difficult to interpret the straight line 
approximations to edges that result from this type of image. Figures 9.8 and 9.9 provide 
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Figure 9.2: The perimeters of the models used in the second and third sets of tests of 
GROPER. 



Figure 9.3: The perimeters of the models used in the fourth set of tests of GROPER. 
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Figure 9.4: A scene from the second set of tests. It contains the objects in figure 9.2. 



Figure 9.5: Straight line approximations to the intensity edges found in the image in figure 
9.4. 
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Figure 9.6: An image of three dimensional objects on a white background. The white 
outline indicates the edges found in the image. 



Figure 9.7: Straight line approximations to these edges. 
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Figure 9.8: An image of a room, containing an arbitrary set of objects. 


an example of this type of image, and the edges it produces. 

9.2.2 Discussion of Results 

Figure 9.10 summarizes the results for these six different sets of images. These results tell 
us that GROPER’s grouping system works quite well on scenes containing two dimensional 
objects, particularly in comparison to the random selection of groups of edges. There are 
also some problems with the way GROPER’s grouping performs on these images, which 
these tests do not reveal. We will discuss those problems in section 9.4. The grouping tests 
do tell us that GROPER works somewhat less well on scenes with more objects, such as 
the ones in set three (see figure 9.11 for an example), and on scenes with three dimensional 
objects. Still, we get good results on these images, and a random strategy would also 
do more poorly on these images than on those of the other two dimensional test groups. 
Finally, we see that GROPER does not perform well on the more realistic three dimensional 
scenes of test six. This seems to be due to the poor quality of edges found in the scene; these 
edges do not seem to provide enough information to even allow people to easily recognize 
objects. 

For tests one, two and four, GROPER averages between 7.8 and 8.8 correct pairs of 
convex sections chosen out of thirty. To evaluate these results, we need to get a rough idea 
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Figure 9.9: Straight line approximations to the edges in this image. 


2D 2D 2D 2D 
Objects, Objects, Objects, Objects, 3 D Real 
Test 1 Test 2 Test 3 Test 4 Ob jects, Scenes 

5 5 3 5 5 

8.8 8.4 5.67 7.8 5.8 

Figure 9.10: The results of grouping experiments. 
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Figure 9.11: A harder two dimensional scene, with sixteen objects 
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Figure 9.13: Another image from test set two. 


of the number of total correct groups, and the number of correct groups of pairs of convex 
sections we would expect to get if we just combined convex sections at random. So we will 
carefully examine GROPER’s performance on a typical image. 

The image in figure 9.13, with edges in figure 9.14 provides a good example. First of 
all, GROPER succeeds in recognizing one of the objects in the image using only a single 
convex section of edges. After removing the edges that object accounts for from the image, 
GROPER divides the remaining edges into 33 convex sections. Ten of the thirty pairs of 
sections GROPER finds most likely to come from a single object really do. Figure 9.15 
shows some of these ten groups, and figure 9.16 shows a few incorrect groups in the top 
thirty. At this point, the image actually does have eleven different pairs of convex sections 
that really come from a single object. Since the image has a total of 512 different pairs of 
convex sections, if we chose thirty pairs at random, the expected number of correct pairs 
we would find would be .65. So we can see that the distance and orientation constraints 
allow GROPER to combine pairs of convex sections very well, and much better than at a 
random rate. 

The performance of the grouping system drops somewhat in tests three and five. We 
might expect that, because these images contain more objects than do the others. And 
test five has many more edges that do not correspond to object boundaries. These factors 
produce many more convex sections in images, and increase the chances that some convex 
sections will appear to go together well, even though they do not come from a single 
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Figure 9.14: The edges it produces. 



Figure 9.15: Some of the correct groups GROPER finds in the image, out of the thirty it 
picks first. Each pair of convex sections is circled. 
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Figure 9.16: Some of the incorrect groups GROPER finds. The edges in each group are 
circled. 

object. These factors would also make a random search for objects take much longer. And 
GROPER’s performance on these images does still seem good. Figures 9.17 and 9.18 show 
some of the first groups GROPER picks from the image shown in figure 9.6. 

GROPER’s grouping system begins to have grave problems when confronted with an 
image like the one in figure 9.8. This image produces a great many edges that do not come 
from occlusions. And objects fade into their background much more, making it harder 
to form good convex sections of edges. In fact, the edges produced by this image form 
a fairly chaotic picture, even for a person. This tells us that we can not take straight 
line approximations to the edges in a realistic image, and expect to perform the kind of 
perceptual grouping on which GROPER relies. For this type of image, we must either 
develop much better techniques for finding the occluding edges in an image, or take quite a 
different approach to grouping, an approach that does not rely solely on edge information. 


9.3 Results of Indexing 

We want to test GROPER’s indexing system to determine its space requirements, its ac¬ 
curacy, and its efficiency. In terms of accuracy, we have designed the indexing system so 
that it will always tell us all the collections of model edges that could match a set of image 
edges, except for bugs, of course. But we also want to know how often it tells us about 
collections of model edges that could not really have come from those image edges. These 
types of errors can have two possible effects. First of all, GROPER decides whether to try 
to verify matches based on the number of sets of model edges that match a particular set 
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Figure 9.17: The correct groups GROPER finds in the image in figure 9.6, out of the thirty 
it picks first. Each is circled. 



Figure 9.18: Some of the incorrect groups GROPER finds. Each group is circled. 
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Object 

Number 

1 

2 

3 

4 

5 

6 

7 

8 

Number 
of Edges 

6 

10 

8 

8 

9 

9 

13 

9 

Number 
of Table 
Entries 

7,317 

14,719 

10,186 

13,361 

15,735 

6,945 

20,729 

12,578 

Object 

Number 

9 

10 

11 

12 

13 

14 

15 

16 

Number 
of Edges 

11 

20 

18 

31 

16 

11 

8 

8 

Number 
of Table 
Entries 

30,209 

66,586 

20,741 

161,388 

17,465 

17,379 

9,049 

11,276 


Figure 9.19: The number of table entries for each of the sixteen models shown in figure 9.2. 

of image edges. If we get too many false positive matches per indexing step, GROPER 
will not attempt to perform verification in situations where it should. This will effect the 
accuracy with which GROPER recognizes objects. Secondly, GROPER indexes with large 
collections of image edges by performing a series of table lookups for the pairs of image 
edges, and combining the results with intersections. If most of the matches found by a 
table lookup could not come from the pairs of image edges we are looking up, then most 
of the work required to combine these results is wasted. This will effect the speed with 
which GROPER recognizes objects. So we have collected some statistics that allow us to 
determine the amount of space required for indexing, as well as its impact on GROPER’s 
speed and accuracy. 

GROPER uses a lot of space. We know from its construction that GROPER’s indexing 
table will require order E n 2 table entries, where n ranges over the number of edges in each 
object in the library. This number becomes large. Moreover, GROPER makes a large num¬ 
ber of table entries for each pair of edges. Figure 9.19 provides information about storage 
requirements. For the library of the sixteen objects shown in figure 9.2, grouping had to 
make over 400,000 table entries. This resulted in overall storage requirements significantly 
in excess of the available core memory of a Symbolics 3600 lisp machine. So recognition us¬ 
ing this library became slow, as the random accessing of this indexing table caused constant 
paging. 

These high space requirements result partly from the large number of entries required 
for each pair of edges. For example, the first object in figure 9.2 has six edges, and fifteen 
different pairs of edges. Yet it requires over 7,000 table entries, an average of almost 500 
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Number of Randomly 

Chosen Edges per Trial 

2 

3 

4 

5 

Number of Trials 

200 

2,100 

6,100 

32,785 

Average Number of 

Verifiable Matches per Trial 

5.7 

.40 

.02 

.0007 

Average Number of 

Unverifiable Matches per Trial 

25.06 

3.05 

.28 

.03 


Figure 9.20: Indexing was performed with random collections of edges. This table shows 
the number of correct positive responses, as well as the number of false positives. 


entries per edge pair. As implemented, GROPER can handle a small number of objects 
that do not contain too many edges, like the libraries in figures 9.1 and 9.3. With some 
work, GROPER could probably handle somewhat more complex libraries. But the indexing 
system GROPER uses is not well suited to handling large libraries of objects. 

Figure 9.20 presents the results of a second set of tests of GROPER’s indexing system. 
To test GROPER, we first created random collections of edges. We did this by giving each 
edge an end point in a randomly chosen spot in a square 150 pixels wide. We then selected 
a random length, and random angle for each edge. We performed indexing with each group 
of edges, to see how many matches GROPER would find. For this test we used a lookup 
table containing information about the sixteen objects shown in figure 9.2. Then for each 
match, we performed a verification step, described in chapter 7, to see if we could really 
find a single rotation and translation to align the matched model edges with the group 
of image edges. This tells us how often GROPER finds unverifiable matches per indexing 
step, and the ratio of verifiable to unverifiable matches found. 

The results show that GROPER’s indexing errors should not effect its accuracy in most 
cases. Usually, by combining two convex sections of edges GROPER obtains groups of four 
or more edges. An indexing step using four edges will, on the average, produce about .3 sets 
of matching model edges, as figure 9.20 shows. With five edges we get an average of less 
than .03 matches. This indicates that once we have a group of four or more image edges 
that come from the same object, we do not need to expect too many spurious matches 
to interfere with GROPER’s attempt to identify the object. However, we do find that 
with a group of three image edges we get an average of about 3.05 matches. A lookup that 
produces more than four matches will cause GROPER to decide not to attempt verification. 
Furthermore, most of these matches do not actually pass a verification test. So a better 
indexing scheme, which only told us about verifiable matches, might well produce better 
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recognition results in this case. 

The high rate of false positive matches found by indexing also has an effect on GROPER’s 
speed. Recall that GROPER finds matches for large collections of edges by performing a 
separate indexing step for every pair of edges, and then combining the results. Three quar¬ 
ters of the matches produced by indexing with a pair of edges will not pass a verification 
test. So GROPER wastes much of the time it spends combining these results. This has 
not caused too much of a waste of time overall only because the effectiveness of grouping 
prevents GROPER from having to perform too many indexing steps. 

GROPER’s indexing system does prove adequate for its needs. It works quickly enough 
so that GROPER usually spends less than half its time indexing. And it provides results 
that do not for the most part interfere with GROPER’s accuracy. However, a more accurate 
indexing system would probably work much faster. It would also require less storage space. 
For these results indicate that the majority of table entries for a pair of edges do not 
really correspond to possible values of the parameters that describe those edges. The 
simplifications made to allow us to easily build the lookup table have resulted in many 
inaccurate entries. 


9.4 Results of Recognition 

We tested GROPER’s overall performance by comparing it to that of another recognition 
system called SEARCHER. SEARCHER works just like GROPER, using the same indexing 
and verification modules. It also uses the same criteria to decide whether or not a match in¬ 
dicates the presence of an object. However, SEARCHER does not use grouping. GROPER 
uses grouping to tell it which collections of edges to use to try to locate objects. Instead, 
SEARCHER performs a simple backtracking search through the space of all possible sets of 
edges. It does not have to explore this entire space. Like GROPER, when SEARCHER finds 
that some edges could not all come from any known object, SEARCHER does not consider 
any other collections of edges that include these. And, like GROPER, when SEARCHER 
recognizes an object it removes from consideration any edges that this object can explain. 
Since SEARCHER also has no way to tell the object side from the background side of 
an edge, it must consider both possibilities in its search, just as GROPER must impose 
figure/ground judgments on its groups. By replacing GROPER’s guided search for objects 
with an undirected search, SEARCHER shows us exactly how much GROPER’s grouping 
component adds to its performance. 

We tested GROPER and SEARCHER on the four sets of images of two dimensional 
objects. Figure 9.21 summarizes the results of these tests. These results show GROPER 
always performing much better than SEARCHER, requiring relatively few indexing steps. 
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Test 1 

Test 2 

Test 3 

Test 4 

Number of Known 

Objects in Each Image 

9 

8 

16 

6 

Average Percentage of Objects 
Correctly Located by GROPER 

84% 

65% 

60% 

97% 

Average Number of False Positive 
Identifications by GROPER 

.8 

1 

5.7 

0 

Average Number of Edge 

Groups Considered by GROPER 

98.8 

75.2 

452.3 

18.6 

Average Percentage of Objects 
Correctly Located by SEARCHER 

58% 

48% 

31% 

60% 

Average Number of False Positive 
Identifications by SEARCHER 

14 

8.2 

*16.5 

5.2 

Average Number of Edge 

Groups Considered by SEARCHER 

25,984.6 

5,805.6 

*30,210 

2981.6 


Figure 9.21: The results of recognition experiments. *These figures do not include the 
results for one image because it took SEARCHER so long that it invariably caused the 
garbage collector on the Lisp Machine to crash. 


However the results show a significant variation in GROPER’s performance on the different 
sets of tests. We will first compare the performance of the two systems, and then discuss 
possible reasons why GROPER performs less well on certain types of images. 

These experiments demonstrate the effectiveness of GROPER’s grouping system both in 
reducing the number of indexing steps needed to find objects and also in improving the ac¬ 
curacy of the system. GROPER always requires far fewer indexing steps than SEARCHER 
does when the two systems attempt to find all the objects they can in an image. For 
example, with the second set of test images, SEARCHER performed approximately 75 
times as many indexing steps, and with the fourth set of images it performed about 160 
times as many. This greater efficiency clearly indicates the success of GROPER’s grouping 
constraints in ordering the search through collections of edges. 

We did not compare the total running time of the two systems, because we made no 
effort to optimize SEARCHER, and so it ran extremely slowly. And both systems took 
quite a long time to run in tests two and three, due to the constant paging the large library 
required. On the first set of tests, GROPER took an average of about three minutes and 
twenty seconds to locate all the objects it could, or an average of about twenty-two seconds 
for each known object in the scene. On the fourth set of images, GROPER took an average 
of one minute and thirteen seconds, or twelve seconds per object. 
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Figure 9.22: The objects GROPER found in the image in figure 9.4. 



GROPER also consistently performed more accurately than searcher. It averaged 2.4, 
1.4, 4.7 and 2.2 more objects correctly found on the four tests. And GROPER had few false 
positive identifications, except on the third set of tests, while SEARCHER always made 
quite large numbers of mistakes. Figures 9.22 and 9.23 show the objects that GROPER 
and SEARCHER found in the image in figure 9.4. However, we must approach these com¬ 
parisons of accuracy a little skeptically, because we made a number of decisions regarding 
accuracy to optimize GROPER’s effectiveness, not SEARCHER’S. 

We could have altered SEARCHER in two possible ways to improve its effectiveness. 
First of all, we could have increased the amount of computation SEARCHER needed, and 
improved its accuracy. Frequently SEARCHER would accept an identified object based 
on relatively little evidence, and then remove from future consideration edges that could 
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have helped provide much stronger evidence for a different object. So, instead of making 
use of the first verifiable match it found, SEARCHER could have looked in the image for 
all the objects it could find, and then accepted the best ones. This would have required 
more computation by removing one of the ways both it and GROPER prune their searches. 
Secondly, we use loose criteria for determining when to accept a match as valid. The two 
systems allowed an error of seven pixels in the location of any part of an object, and an 
error of in the angle between two edges. Furthermore, both systems accepted a match 
that accounted for as little as twenty-five percent of an object’s perimeter. We used these 
criteria because with tighter criteria GROPER would occasionally miss an object, and 
even with these standards GROPER did not make many false positive identifications. But 
we could probably have improved SEARCHER’S performance by tightening its standards 
for accepting a match. So we could certainly have made SEARCHER a more accurate 
recognition system by changing some parameters that we picked to optimize GROPER. 
But the fact that GROPER could get away with accepting the first valid matches it found, 
using loose standards, while SEARCHER could not, does indicate that grouping has helped 
make for a more accurate recognition system. 

The reader should not, however, assume that SEARCHER typifies the performance of 
existing recognition systems. Certainly, many systems would perform more quickly and 
accurately than SEARCHER. But one way many systems do this is by using a certain 
amount of grouping. So the comparison with SEARCHER only reveals how much grouping 
helps GROPER, it does not tell us how GROPER would compare with existing recognition 
systems. 

The tests run on GROPER show quite a variation in its accuracy. On the first set 
of tests, GROPER performed with good accuracy. But then on the second and third 
sets of images, GROPER performed much less well. These two sets of images used a 
different collection of objects. On closer examination of the results, we found that GROPER 
consistently failed to find certain objects in these images, while locating others successfully. 
For example, GROPER never found object six, and found object sixteen only one time 
in seven images. Figure 9.2 shows these objects. In general, GROPER often failed to 
locate objects that contained few parts, each of which had some short edges. The algorithm 
GROPER uses to make straight line approximations to edges usually eliminates these small 
edges, and they may also fail to appear due to the inherent difficulty of accurately locating 
edges at the corners of objects. Figure 9.11 and figure 9.12 provide an example of this 
problem. Without these short edges, GROPER has a hard time forming good convex 
sections to use in grouping. This would account for much of the difficulty GROPER has in 
detecting certain objects. 

To check this hypothesis, we made a fourth set of tests using only six objects, which 
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Figure 9.24: Straight line approximations to the edges produced by an image from test 
four. 

GROPER had proven adept at recognizing. Figure 9.3 shows these six objects, and figure 
9.24 shows the edges derived from one of the images in the test. The results proved en¬ 
couraging, showing twenty-nine of thirty objects correctly identified, with no false positive 
identifications. In retrospect, this difficulty may account for many of the mistakes made 
in the first set of images as well. If the hypothesis is correct, we could overcome these 
problems by using low-level processing better suited to GROPER. GROPER does best 
when early processing gives it connected edges it can easily group together. We might try 
a straight line approximation algorithm that does not eliminate short edges that connect 
other edges. Or we might use the curved sections of edges themselves for grouping, avoiding 
the simplifications of straight line approximations. 

The difficulties GROPER had in recognizing objects in tests two and three, as well 
as its difficulty in grouping on test six, do show certain inherent problems in GROPER’s 
approach to grouping. This approach requires first finding meaningful sections of occluding 
edges in an image. These may be hard to come by in real scenes or in scenes with heavy 
occlusion. In those cases we would still expect the distance and orientation constraints to 
apply when appropriate. But other constraints will have to supplement them, helping us 
analyze sections of image where we can not find extended occluding edges. 


9.5 Conclusions 

This thesis has had three main goals. First, and foremost, we wanted to show the im¬ 
portance of using grouping to deal with difficult recognition tasks. Secondly, we wanted 
to show that we can effectively approach the grouping problem by looking at the image 
formation process, to see what it tells us about which parts of an image have the greatest 
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likelihood of coming from the same object. Finally, we wanted to show that we have suc¬ 
cessfully understood the influence of some important factors in grouping. The effectiveness 
of GROPER’s grouping system helps to support each of these contentions. 

To argue for the overall importance of grouping, we have maintained that when tackling 
complex recognition tasks, the greatest difficulties we encounter will be growing computa¬ 
tional requirements and growing problems of accuracy. Handling large libraries of objects 
increases the number of possible matches between image features and object features. Fur¬ 
thermore, flexible objects and three dimensional objects will make it harder to eliminate 
incorrect hypotheses. We will need to match more of the image to the object before we can 
determine whether that match will work out, forcing us to consider more possibilities. And 
large libraries of flexible objects will create a much greater probability that some known 
object will have some way of matching any arbitrary collection of image edges, causing false 
positive identifications. GROPER has shown that grouping, especially when combined with 
indexing, can greatly reduce the combinatorics of recognition. Instead of searching through 
a huge number of possible matches, we can select a few groups of image edges, and quickly 
determine which objects might have produced them. We can use grouping to improve the 
accuracy of a recognition system because it provides us with an additional source of infor¬ 
mation to use when evaluating matches between the image and the object models. Instead 
of just looking at whether we can align an object model with some image edges, we can also 
look at whether those image edges seem to all come from the same object. The mistakes 
SEARCHER makes show that even with relatively simple recognition tasks, we can not 
reliably base our decision about whether to accept a match purely on the basis of how well 
a model aligns with image edges. So we have argued for the use of grouping by pointing out 
potential difficulties in extending current recognition systems to handle harder problems, 
and by demonstrating that grouping can help alleviate these problems. 

We also wanted to show that one need not resort to completely ad-hoc solutions to 
the problem of grouping. Gibson (Gibson[8], for example) has persuasively argued for 
the importance in vision research of understanding constraints provided by the physical 
world and the image formation process. Marr and his followers (Marr[20], for example) 
have effectively applied this viewpoint in computer vision research. In this thesis we have 
attempted to capitalize on this point of view in analyzing the problem of grouping. We 
have attempted to understand the factors in the image formation process that determine the 
influence of distance and orientation on the likelihood that sections of an image originate 
with the same object. We have presented only a rudimentary analysis of this problem. 
Yet this analysis has still provided insights that have helped to build an accurate grouping 
system. And the success of GROPER’s grouping system provides some evidence that this 
analysis is at least on the right track. 
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Appendix A 


Making Random Objects 

We have performed empirical tests to find a probability distribution for the lengths of 
occlusions that random objects produce. Chapter 3 reports the results of these tests. 
Performing these experiments required the construction of random objects. This appendix 
will discuss some problems involved in building random objects, and how we built them for 
these experiments. 

We built random objects out of randomly constructed convex parts. To make a random 
convex shape, we selected the size of the shape, and its number of edges, according to a 
probability distribution. We then used a simple algorithm to construct a connected, closed 
shape that would meet these two criteria. 

We wanted particularly to make sure that we constructed shapes with a random size, 
because this seemed like the factor most likely to influence the size of the occlusions pro¬ 
duced by the object. As chapter 3 mentions, we expect large objects to produce longer 
occlusions than short ones. So we selected the area of the convex shape from a uniform 
random distribution from 0 to an arbitrary constant. Notice that we would have gotten 
different results by choosing the perimeter of the object from a uniform distribution, or 
by randomly choosing the distance from the viewer to the object. These choices would 
have produced smaller objects, because the area of an object increases with the square of 
its perimeter. So, since we had no convincing reason for choosing one over the other, we 
picked the size distribution that would tend to produce the worst results for our hypothesis 
that short occlusions occur more often then do long ones. 

To determine the number of sides of a convex shape we picked a number between three 
and seven, with each number equally probable. 

Then, to construct a random convex polygon, we started with a randomly chosen tri¬ 
angle, and added edges, one at a time, until we had enough. To make a random triangle, 
we chose a base with a random length. We then picked a number between 0 and ir for the 
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angle between the base and one edge, and we picked a number between 0 and w minus the 
first number for the angle between the base and the second edge. The edge length and 
the two angles define the triangle. We then recursively added one new edge at a time to 
this polygon. To add an edge, we first picked one of the polygon’s edges at random, and 
removed it. We connected the polygon again by adding two randomly chosen edges, subject 
to the constraint that they must maintain the convexity of the polygon. Having created a 
random convex polygon with the appropriate number of sides, we then scaled the polygon 
so that it would have the appropriate area. 

We conducted experiments using objects with many convex parts. To form an object 
with a specific number of parts, we created each convex part separately, and then randomly 
joined them together. To connect a convex part to an object, we randomly located each 
of them in an image sized space. If they intersected, we joined them at the points of 
intersection, and removed all overlapping material. If they did not intersect, we tried 
randomly locating them again, until they did intersect. 

This method of forming random objects has the disadvantage that it makes it difficult 
to characterize the distribution of all the variables that describe a random object. For 
example, we can not easily determine the distribution of the lengths of edges in such a 
random object. We can not even easily tell how many edges a random object has, since 
joining two convex parts together may eliminate some of the edges, or it may not. However, 
even if we could easily characterize these distributions, we would have no principled basis 
for choosing one distribution over another. None of these distributions would describe the 
real world, which does not contain random objects at all. So instead, we pick an arbitrary 
“random” world, which we can easily test. We do not expect that different kinds of random 
objects would produce dramatically different results. In any event, we do not intend the 
experiments that use these objects to prove anything about general random worlds. Rather, 
we perform them to gain confidence in our intuitions about how occlusions occur in the 
real world, and why short occlusions will occur more often than long ones. 
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Appendix B 


The Likelihood of a type 2 
Orientation 


This appendix derives the probability of a type 2 orientation occurring between randomly 
oriented groups, in a simplified situation. We suppose that each group is infinitesimally 
small. We then calculate the probability that the projections of the two groups intersect. 
Chapter 3 provides the background needed to understand this appendix, and makes use of 
the results. 

We begin with some definitions. Call the two groups A and B. Since they are infinitesi¬ 
mal, A and B occur on points in the plane, which we will also call A and B. A and B both 
have projections. If finite, these projections will actually be infinitesimal also, and only 
cover either point A or point B. If infinite, this projection will have an angle between 0 
and x. We will call the angles of A’s projection a, and of B’s projection /3. Two ray’s will 
bound A’s projection. We will call them a\ and a 2 . bi and & 2 will refer to the rays that 
bound B’s projection. Without loss of generality, we can assume that A lies at the origin, 
and B lies on the x axis. We will call the angle between ai and the x axis 6 , and the angle 
between b\ and the x axis ip. Figure B.l illustrates these variables. 

If either group has a finite projection, we can easily produce an answer. If group A 
has a finite projection, then the two groups’ projections intersect only when point A lies 
inside B’s projection. Since B’s projection has a uniform, random orientation, it will have 
a probability of ^ of covering point A. Similarly, if B has a finite projection there will be a 
probability of ^ of a type 2 orientation occurring. And of course, if both groups have finite 
projections, they can not intersect. 

We now assume that both groups have infinite projections. We will determine the 
probability that these projections intersect by dividing the problem into eight different 
parts. First of all, we will consider separately the cases when a < f and when a > j. 
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Y 
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A X 


Figure B.l: Two tiny groups with infinite projections. 


Then for each of those cases, we will divide the possible orientations of A’s projection into 
four groups, depending on the quadrant in which the ray a\ falls. 

First, suppose a < f. 

Suppose that a\ falls in the quadrant where x is negative and y is positive, that is, 
j < 9 < 7T. This tells us first of all that all of A’s projection falls in the half-plane where y 
is positive. So B can not fall in A’s projection. In fact, the projections intersect only when 
either A falls in B’s projection, or when 62 intersects a\. This happens if ip — (3 < 6, i.e. if 
ip < $ + (3. This occurs whenever ip has a value between 0 and j +/3. Furthermore, since 0 
has a uniform distribution between j and w, the projections will also have a fifty percent 
chance of intersecting if ip has a value between | + (3 and tt + /3. So the total probability 
of intersecting projections is: 

f +( 3 + f 

2tt 

Suppose that a\ falls in the quadrant where x and y are both positive, i.e. that 6 < j. 
We must consider two possibilities. A’s projection may all fall in one quadrant. This 
happens when a < 6. Or, A’s projection may fall in two quadrants. In the first case, 
the projections will intersect whenever 0 < ip and ip — (3 < 0. As 9 has equal chances 
of having any value from a to the chances that ip achieves an appropriate value are: 
(/3 + * -h. Furthermore, the situation where a < 9, occurs with probability 1 - t. 

So, the total probability of a being less than 6 , and the projections intersecting, is: 


(/3 + 


2a + ir 
4 


)*( 1 - 


2a 1 
—) * — 
7 x ' 2i r 
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In the second case, when 9 < a, the projections intersect only when they intersect over 
an infinite space, that is, if b\ or 62 lie in between a x and 02 . This has a probability 
of occurring. So, overall, the probability that 9 will be less than a, and that the two 
projections will then intersect is: 


(a 4-/3) 2 a _ 2a 2 + 2 aj3 

2 x x 2 x 2 

Combining these two results, we find that, if 9 < the probability of projections 
intersecting is: 


(G 3 


2a 


4- 7r . , Sad + 4q 2 + 2ax 2a 2 + 2a/3 1 

-) _ ( — :-) -|- (-—)) * — 

4 ’ K 4t r 1 ( 7 r ” 2tt 


, n a 7 r 2a/3 a 2 

^ + 2 + J~ —-T 


a 2a 2 2a/3 1 

- H-4-- * — 

2 x x ' 2 tt 


7 r a 2 , 1 
- (/3 + 4 + T'S 

Next, suppose ai falls in the quadrant where x is positive and y negative. Once again, 
B can not fall in A’s projection. In this case, a 2 makes an angle of a 4 - (2x - 9) with the 
x axis. An intersection occurs if B’s projection extends anywhere within this range. This 
happens if 0 — a < 0. If xp < /3, then A falls inside B’s projection. So, the total chances of 
projections intersecting are: 


<T + “ + «i 

Finally, suppose ai falls in the quadrant where x and y are both negative. First of all, 
B may fall in A’s projection, with probability aK Secondly, if this does not happen, an 
intersection occurs if > 9 — a, or if V’ < 0. Given that B does not fall in A’s projection, 
9 will randomly range from to ^ — a. So an intersection occurs when falls in a range 
that varies between it to 2x and y - a to 2x, or ip may fall in the range from 0 to f3. 
Overall, the chances of this happening are: 


(x4-/3)4-(f4-/34-a) 1 ?—a 

- i - * - * ■ £ —z - 

2 2 x f 

When we add in the chances of A’s projection including B, we find that the total probability 
of the projections intersecting is: 


( 


- 2 a* + x/3 - 2/3a + - a 2 


x 


+ ia)~ 



. 2 f3a a 2 1 

+ 3a + /3---— 

x x 2 x 
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We now know, if a < f, the profcahiliiy of tfce project***# of A and B intersecting, 
depending on the quadrant in whkh a s lies, : Sininn Bn ; iHa^ of 

projections intersect. Doing tills, we get: 

*+»+—* 

2r 


Similar, tedious reasoning produces the same result when a > 






Appendix C 


Lower types and Whether Groups 
Come from the Same Object 


This appendix shows that, all other factors being equal, if two groups have a lower type of 
orientation they will have a greater chance of coming from the same object. 

We can prove this mathematically. First of all, we will prove that: 

P{typei\Oi = C> 2 ,rd) P(type 2 \Oi - C> 2 ,rd) 

P(typei\Oi ± 0 2 ,rd ) P{type 2 \O x 0 2 ,rd) 

(In this appendix we abbreviate restofdata with “rd”, to make some long formulas more 
readable.) The ratio on the left indicates the chances that the two groups come from 
different objects when they have a type i relationship. The ratio on the right indicates the 
same thing for a type 2 relationship. So the inequality asserts that, given the same data, 
there is a greater chance the groups come from different objects when they have a type 2 
relationship. 

P{type\\Q\ = 0 2 ,rd) _ 

P(typei\Oi 0 2 ,rd ) 

P(typei\same , Oi — 0 2 , rd ) * P{same\0\ = 0 2 ,rd) 

P(typei\Oi ^ 0 2 ,rd) 

P(typex\adj,Oi = 0 2 ,rd ) * P{adj\0\ = 0 2 ,rd) 

+ P(typei\Oi ^ 0 2 ,rd) 

P{type\\notadj,0\ — 0 2 ,rd) * P(notadj\0\ = 0 2 ,rd) 

+ P(typei\Oi ^ 0 2 ,rd) 

where “same” stands for the two groups coming from the same convex section of the same 
object, and “adj” and “notadj” denotes that they come from adjacent or non-adjacent 
sections. Chapter 3 pointed out that P{type\\same, 0\ — 0 2 ,rd) = 1, and described the 
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assumption that groups from different sections of the same object have random orienta¬ 
tions, subject to the constraint that groups from adjacent sections can not produce a type 2 
orientation. That assumptions tells us that: 


P(typei\adj,Oi = 0 2 ,rd) = 


_ P(typei\Oi ^ 0 2 ,rd ) _ 

P(type\\Ox £ 0 2 ,rd) + P(type 2 \0 1 £ 0 2 ,rd) 


and that: 

P{type\\notadj,Oi = 0 2 ,rd) = P{typei\Oi ^ 0 2 ,rd) 


From this we find: 

P(typei\Oi = 0 2 , rd ) 
P(typei\Oi £ 0 2 ,rd ) 
P(same\0\ = 0 2 ,rd ) 
P(typei\Oi 0 2 ,rd) 


+ 


P(typei\Oi^Oj,rd) _ 

P(typei\Oj ^C>2,rd)-\-P(type2\Oi ,rd) 


* P(adj\0\ — 0 2 ,rd ) 


P(type x \Oi 0 2 , rd) 

P(type x \Oi 7- 0 2 ,rd)* P(notadj\O x = 0 2 ,rd) 
+ P(type x \Oi ^ 0 2 ,rd) 


which equals: 

P(same\Oi = 0 2 ,rd) 

P(type 1 \0 1 0 2 ) 

_ P(adj\Oi = 0 2 ,rd ) _ 

P(type x \Oi ^ 0 2 , rd) + P(type 2 \O x ^ 0 2 ,rd) 
+P(notadj\O x = 0 2 ,rd ) 


Using similar reasoning we find that: 


P{type 2 \O x = 0 2 , rd) 
P(type 2 \0 1 ^ 0 2 , rd) 


which equals: 


Clearly: 


P(typei\Oi^O^,rJ)+P(t^e 2 'loi^02,rd) * P{ ad j\0\ ~ Oj, rd) 
P(type 2 \Oi ^ 0 2 , rd) 


P{type 2 \0\ ^ 0 2 ,rd) * P(notadj\O x = 0 2 ,rd ) 
+ P(type 2 \Oi ^ 0 2 ,rd) 


_ P(adj\Oi = 0 2 i rd) _ 

P(type 1 \0 1 £ 0 2 ,rd) + P{type 2 \O x £ 0 2 ,rd) 
+P(notadj\O x = 0 2 ,rd ) 


P(same|Oi = 0 2 ,rd) 
P(type x \Oi ± 0 2 ) 
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Piadf\Oi^Qt r rd) 

P(tvpei\Ot £ 0 3 ,rd) + j* Oa» rd) 

+P(nc4adj\Oi = O 3 , rd) 


Oj,rd) 

P(typei\Oi £ 0 3 ,rd) + P(tjfpe 7 \Oi # 0 3 ,rd) 
+P(notadj\Oi = 0 3 ,rd) 


and so the inequality holds. Similar reasoning tells us that: 

P{type 2 \Oi = Oi,rd) P(H/petlOi = 0„rd) 


P(type 2 |Oi ^ 0 3 ,rd) P(typt 3 lOt £ 0 3 >rd) 


This reasoning shows that our theory of grouping predicts that, all other factors being 
equal, groups with lower types of orientations have n greeter chance of coming from the 
same object. 
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Appendix D 


Arbitrary Constants Used in 

GROPER 


GROPER contains a number of arbitrary constants. We chose these constants before 
performing the tests described in chapter 9. This appendix collects them, for convenience. 

The maximum object diameter equals 300 pixels. GROPER uses this in determining 
the probability distribution of distances that separate two groups. Chapter 5 describes this 
in more detail. 

The probability that parts of two different objects intersect equals .2. GROPER uses 
this in determining the probability that a certain distance separates two groups that come 
from different objects. As chapter 5 describes, GROPER finds this probability using a linear 
combination of two probability distributions. This constant determines how GROPER 
combines them. 

When an indexing step produces 4 or fewer matches, GROPER performs verification 
on each match. 

The probability that two groups from the same object come from the same convex 
section of that object, adjacent convex sections, or non-adjacent convex sections = j. 
GROPER uses this to determine the likelihood of different type's of orientations occurring. 

The maximum diameter of a convex part of an object equals 150 pixels. If two groups 
have projections that intersect, but the distance from one group to the intersection of their 
projections exceeds 150 pixels, then GROPER decides they cannot have a type 2 relationship. 
In general, GROPER uses this value to decide how inti and inti effect the likelihood of 
two groups coming from adjacent sections of the same object. 

GROPER performs indexing with single convex sections of edges only if they contain 
at least four edges. 

If indexing shows that a pair of convex sections match many different sets of object edges, 
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GROPER next explores the five beet triplet of groups obtained by adding an additional 
group. 

GROPER allows a maximum error of 7 pixel* in the sensed location of any edge in the 
image. 

maximum error expected in the seated imgh of an edge is sin jg. 
lookup table for indexing quantizes distances to a multiple of 20 pixels, 
it quantise! angles to a multiple of 
GROPER accepts a match if it accounts for at least 28% of an ohject’s perimeter. 
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This report explores the use of grouping in object recognition by compu¬ 
tational systems. Many recognition systems in the past have performed an 
undirected search through the space of different segmentations of an image 
in order to recognize objects. This approach leads to significant problems 
of computational complexity and accuracy. The process of grouping deter¬ 
mines the sections of an image most likely to come from a single object. This 
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can tell a recognition system which segmentations of the image to consider 
first, improving both its speed and accuracy. 

The report describes a particular recognition system called GROPER. GROP¬ 
ER performs grouping by using distance and relative orientation constraints 
that estimate the likelihood of different edges in an image coming from the 
same object. The thesis presents both a theoretical analysis of the group¬ 
ing problem and a practical implementation of a grouping system. And 
it discusses the relevance of this theory of grouping to human psychology. 
GROPER also uses an indexing module to allow it to make use of knowl¬ 
edge of different objects, any of which might appear in an image. We test 
GROPER by comparing it to a similar recognition system that does not use 
grouping. 
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